Interactive Redis Cluster Failover

Visualizing PFAIL → FAIL → Election → Promotion with internal state details.

Masters: 0
Slaves: 0
currentEpoch: 0A cluster-wide 'term' number. A slave increments it to start a new election.
replica-validity-factor: A slave is disqualified if its link is down for `node_timeout * factor`

Cluster Topology

Core Data Structures & Logic (clang types)

/* Represents a single node's state */
struct clusterNode {
  uint16_t flags;         // MASTER, SLAVE, FAIL?
  uint64_t configEpoch;   // Slot version number
  long long repl_offset; // Sync progress
  struct clusterNode *slaveof;
};

/* Cluster-wide (per-node view) */
struct clusterState {
  uint64_t currentEpoch;  // Election term
  uint64_t lastVoteEpoch; // Last term voted in
};
/* Node Flags (Bitmap) */
#define CLUSTER_NODE_MASTER    (1<<0)
#define CLUSTER_NODE_SLAVE     (1<<1)
#define CLUSTER_NODE_PFAIL     (1<<2)
#define CLUSTER_NODE_FAIL      (1<<3)
#define CLUSTER_NODE_NOFAILOVER (1<<4)
/* Calculates slave's rank. Lower is better. */
int clusterGetSlaveRank() {
    long long myoffset;
    int j, rank = 0;
    clusterNode *master = myself->slaveof;

    if (master == NULL) return 0;
    // Get my own replication progress
    myoffset = replicationGetSlaveOffset();

    // Count sibling slaves with a better offset
    for (j = 0; j < master->numslaves; j++)
        if (master->slaves[j] != myself &&
            !nodeCantFailover(master->slaves[j]) &&
            master->slaves[j]->repl_offset > myoffset)
                rank++;
    return rank;
}

/* Helper to check the NOFAILOVER flag */
#define nodeCantFailover(n) \\
    ((n)->flags & CLUSTER_NODE_NOFAILOVER)

Node Inspector

Select a node to inspect its internal state.

Failover Process Explained

0. Node-Specific Views

There is no "global state". Every node maintains its own map of all other nodes. When Node A thinks Node B is failing, it sets a PFAIL flag on its local representation of Node B. The inspector shows this per-node perspective.

1. Failure Detection (PFAIL → FAIL)

A node marks another as PFAIL if it doesn't receive a PONG reply within `node_timeout`. It elevates this to FAIL after receiving failure reports from a majority of other masters ((N/2) + 1) via the gossip protocol.

2. Slave Election: Initiation & Ranking

A slave begins a failover if: its master is FAIL, owns slots, its replication link is not too old (replica-validity-factor), and it isn't configured with NOFAILOVER. It calculates its rank based on replication offset. Election Delay Formula: 500ms (Base) + random(0-500ms) + (rank * 1000ms)

3. Slave Election: Voting & Winning

The slave increments currentEpoch and requests votes. Masters grant a vote if:

  • They haven't voted in the current epoch (lastVoteEpoch < currentEpoch).
  • The slave's master is indeed marked as FAIL.
  • The slave's claimed configEpoch is not stale compared to other nodes.
  • They haven't voted for another slave of the same master recently (anti-flapping).
A slave wins with a majority of votes ((N/2) + 1).

4. Slave Promotion & Healing

The winner becomes a master, takes the slots with a higher configEpoch, and broadcasts its new state. Other nodes receive this gossip and update their local views. Orphaned slaves of the old master also learn of the new master via gossip and reconfigure to replicate from it.

Event Log