Cluster Topology
Core Data Structures & Logic (clang types)
/* Represents a single node's state */ struct clusterNode { uint16_t flags; // MASTER, SLAVE, FAIL? uint64_t configEpoch; // Slot version number long long repl_offset; // Sync progress struct clusterNode *slaveof; }; /* Cluster-wide (per-node view) */ struct clusterState { uint64_t currentEpoch; // Election term uint64_t lastVoteEpoch; // Last term voted in }; /* Node Flags (Bitmap) */ #define CLUSTER_NODE_MASTER (1<<0) #define CLUSTER_NODE_SLAVE (1<<1) #define CLUSTER_NODE_PFAIL (1<<2) #define CLUSTER_NODE_FAIL (1<<3) #define CLUSTER_NODE_NOFAILOVER (1<<4)
/* Calculates slave's rank. Lower is better. */ int clusterGetSlaveRank() { long long myoffset; int j, rank = 0; clusterNode *master = myself->slaveof; if (master == NULL) return 0; // Get my own replication progress myoffset = replicationGetSlaveOffset(); // Count sibling slaves with a better offset for (j = 0; j < master->numslaves; j++) if (master->slaves[j] != myself && !nodeCantFailover(master->slaves[j]) && master->slaves[j]->repl_offset > myoffset) rank++; return rank; } /* Helper to check the NOFAILOVER flag */ #define nodeCantFailover(n) \\ ((n)->flags & CLUSTER_NODE_NOFAILOVER)
Node Inspector
Failover Process Explained
0. Node-Specific Views
There is no "global state". Every node maintains its own map of all other nodes. When Node A thinks Node B is failing, it sets a PFAIL flag on its local representation of Node B. The inspector shows this per-node perspective.
1. Failure Detection (PFAIL → FAIL)
A node marks another as PFAIL if it doesn't receive a PONG reply within `node_timeout`. It elevates this to FAIL after receiving failure reports from a majority of other masters ((N/2) + 1) via the gossip protocol.
2. Slave Election: Initiation & Ranking
A slave begins a failover if: its master is FAIL, owns slots, its replication link is not too old (replica-validity-factor), and it isn't configured with NOFAILOVER. It calculates its rank based on replication offset. Election Delay Formula: 500ms (Base) + random(0-500ms) + (rank * 1000ms)
3. Slave Election: Voting & Winning
The slave increments currentEpoch and requests votes. Masters grant a vote if:
- They haven't voted in the current epoch (lastVoteEpoch < currentEpoch).
- The slave's master is indeed marked as FAIL.
- The slave's claimed configEpoch is not stale compared to other nodes.
- They haven't voted for another slave of the same master recently (anti-flapping).
4. Slave Promotion & Healing
The winner becomes a master, takes the slots with a higher configEpoch, and broadcasts its new state. Other nodes receive this gossip and update their local views. Orphaned slaves of the old master also learn of the new master via gossip and reconfigure to replicate from it.