Redis设计与实现笔记第十六章哨兵 Sentinel

Sentinel 哨兵

哨兵是程序高可用性的一个保障.
通过一个或多个哨兵程序组成的 Sentinel 系统可以监视任意多个主服务器,以及这些主服务器属下的所有从服务器,并在被监视的主服务器进行下线状态时,自动从其从服务器中选出升级为主服务器,由新服务器来替代已下线的主服务器继续处理命令请求.

后续内容就是详细介绍 Sentinel 系统对主服务器执行故障转移的整个过程

16.1 启动并初始化 Sentinel

启动哨兵程序分为以下步骤:
1): 启动初始化服务器.
2): 将普通 Redis 服务器使用的代码替换成 Sentinel 专用代码.
3): 初始化 Sentinel 状态.
4): 根据给定的配置文件,初始化 Sentinel 的监视主服务器列表.
5): 创建连向主服务器的网络链接

16.1.1 初始化服务器

从下面启动代码可以看出启动方式由函数 checkForSentinelMode 来决定,是否使用 sentinel 的模式进行一个启动,添加的指令也是用的 sentinelcmds 的命令表

/* Returns 1 if there is --sentinel among the arguments or if
 * argv[0] is exactly "redis-sentinel". */
int checkForSentinelMode(int argc, char **argv) {
    int j;

    if (strstr(argv[0],"redis-sentinel") != NULL) return 1;
    for (j = 1; j < argc; j++)
        if (!strcmp(argv[j],"--sentinel")) return 1;
    return 0;
}

// 检查服务器是否以 Sentinel 模式启动
server.sentinel_mode = checkForSentinelMode(argc,argv);

// 初始化服务器
initServerConfig();

/* We need to init sentinel right now as parsing the configuration file
 * in sentinel mode will have the effect of populating the sentinel
 * data structures with master nodes to monitor. */
// 如果服务器以 Sentinel 模式启动，那么进行 Sentinel 功能相关的初始化
// 并为要监视的主服务器创建一些相应的数据结构
if (server.sentinel_mode) {
    initSentinelConfig();
    initSentinel();
}

16.1.2 使用 Sentinel 的专用代码

在 initSentinel 代码中,会进行一个命令表的加载.
一个主要的查询命令 INFO 也不同于普通服务器,而是使用一个特殊的版本

/* Perform the Sentinel mode initialization. */
// 以 Sentinel 模式初始化服务器
void initSentinel(void) {
    int j;

    /* Remove usual Redis commands from the command table, then just add
     * the SENTINEL command. */

    // 清空 Redis 服务器的命令表（该表用于普通模式）
    dictEmpty(server.commands,NULL);
    // 将 SENTINEL 模式所用的命令添加进命令表
    for (j = 0; j < sizeof(sentinelcmds)/sizeof(sentinelcmds[0]); j++) {
        int retval;
        struct redisCommand *cmd = sentinelcmds+j;

        retval = dictAdd(server.commands, sdsnew(cmd->name), cmd);
        redisAssert(retval == DICT_OK);
    }

    /* Initialize various data structures. */
    /* 初始化 Sentinel 的状态 */
    // 初始化纪元
    sentinel.current_epoch = 0;

    // 初始化保存主服务器信息的字典
    sentinel.masters = dictCreate(&instancesDictType,NULL);

    // 初始化 TILT 模式的相关选项
    sentinel.tilt = 0;
    sentinel.tilt_start_time = 0;
    sentinel.previous_time = mstime();

    // 初始化脚本相关选项
    sentinel.running_scripts = 0;
    sentinel.scripts_queue = listCreate();
}

16.1.3 初始化 Sentinel 状态

在完成命令表加载之后,紧接着会进行 sentinelState 的结构的一个初始化.

struct sentinelState {

    // 当前纪元
    uint64_t current_epoch;     /* Current epoch. */

    // 保存了所有被这个 sentinel 监视的主服务器
    // 字典的键是主服务器的名字
    // 字典的值则是一个指向 sentinelRedisInstance 结构的指针
    dict *masters;      /* Dictionary of master sentinelRedisInstances.
                           Key is the instance name, value is the
                           sentinelRedisInstance structure pointer. */

    // 是否进入了 TILT 模式？
    int tilt;           /* Are we in TILT mode? */

    // 目前正在执行的脚本的数量
    int running_scripts;    /* Number of scripts in execution right now. */

    // 进入 TILT 模式的时间
    mstime_t tilt_start_time;   /* When TITL started. */

    // 最后一次执行时间处理器的时间
    mstime_t previous_time;     /* Last time we ran the time handler. */

    // 一个 FIFO 队列，包含了所有需要执行的用户脚本
    list *scripts_queue;    /* Queue of user scripts to execute. */

} sentinel;

void initSentinel(void) {
    ....
    
    /* Initialize various data structures. */
    /* 初始化 Sentinel 的状态 */
    // 初始化纪元
    sentinel.current_epoch = 0;

    // 初始化保存主服务器信息的字典
    sentinel.masters = dictCreate(&instancesDictType,NULL);

    // 初始化 TILT 模式的相关选项
    sentinel.tilt = 0;
    sentinel.tilt_start_time = 0;
    sentinel.previous_time = mstime();

    // 初始化脚本相关选项
    sentinel.running_scripts = 0;
    sentinel.scripts_queue = listCreate();
}

16.1.4 初始化 Sentinel 状态的 master 属性

Sentinel 状态中的 master 字典记录了所有被监视的主服务器信息,键为服务器名字,值为 sentinelRedisInstance 结构

typedef struct sentinelRedisInstance {
    
    // 标识值，记录了实例的类型，以及该实例的当前状态
    int flags;      /* See SRI_... defines */
    
    // 实例的名字
    // 主服务器的名字由用户在配置文件中设置
    // 从服务器以及 Sentinel 的名字由 Sentinel 自动设置
    // 格式为 ip:port ，例如 "127.0.0.1:26379"
    char *name;     /* Master name from the point of view of this sentinel. */

    // 实例的运行 ID
    char *runid;    /* run ID of this instance. */

    // 配置纪元，用于实现故障转移
    uint64_t config_epoch;  /* Configuration epoch. */

    // 实例的地址
    sentinelAddr *addr; /* Master host. */

    // 用于发送命令的异步连接
    redisAsyncContext *cc; /* Hiredis context for commands. */

    // 用于执行 SUBSCRIBE 命令、接收频道信息的异步连接
    // 仅在实例为主服务器时使用
    redisAsyncContext *pc; /* Hiredis context for Pub / Sub. */

    // 已发送但尚未回复的命令数量
    int pending_commands;   /* Number of commands sent waiting for a reply. */

    // cc 连接的创建时间
    mstime_t cc_conn_time; /* cc connection time. */
    
    // pc 连接的创建时间
    mstime_t pc_conn_time; /* pc connection time. */

    // 最后一次从这个实例接收信息的时间
    mstime_t pc_last_activity; /* Last time we received any message. */

    // 实例最后一次返回正确的 PING 命令回复的时间
    mstime_t last_avail_time; /* Last time the instance replied to ping with
                                 a reply we consider valid. */
    // 实例最后一次发送 PING 命令的时间
    mstime_t last_ping_time;  /* Last time a pending ping was sent in the
                                 context of the current command connection
                                 with the instance. 0 if still not sent or
                                 if pong already received. */
    // 实例最后一次返回 PING 命令的时间，无论内容正确与否
    mstime_t last_pong_time;  /* Last time the instance replied to ping,
                                 whatever the reply was. That's used to check
                                 if the link is idle and must be reconnected. */

    // 最后一次向频道发送问候信息的时间
    // 只在当前实例为 sentinel 时使用
    mstime_t last_pub_time;   /* Last time we sent hello via Pub/Sub. */

    // 最后一次接收到这个 sentinel 发来的问候信息的时间
    // 只在当前实例为 sentinel 时使用
    mstime_t last_hello_time; /* Only used if SRI_SENTINEL is set. Last time
                                 we received a hello from this Sentinel
                                 via Pub/Sub. */

    // 最后一次回复 SENTINEL is-master-down-by-addr 命令的时间
    // 只在当前实例为 sentinel 时使用
    mstime_t last_master_down_reply_time; /* Time of last reply to
                                             SENTINEL is-master-down command. */

    // 实例被判断为 SDOWN 状态的时间
    mstime_t s_down_since_time; /* Subjectively down since time. */

    // 实例被判断为 ODOWN 状态的时间
    mstime_t o_down_since_time; /* Objectively down since time. */

    // SENTINEL down-after-milliseconds 选项所设定的值
    // 实例无响应多少毫秒之后才会被判断为主观下线（subjectively down）
    mstime_t down_after_period; /* Consider it down after that period. */

    // 从实例获取 INFO 命令的回复的时间
    mstime_t info_refresh;  /* Time at which we received INFO output from it. */

    /* Role and the first time we observed it.
     * This is useful in order to delay replacing what the instance reports
     * with our own configuration. We need to always wait some time in order
     * to give a chance to the leader to report the new configuration before
     * we do silly things. */
    // 实例的角色
    int role_reported;
    // 角色的更新时间
    mstime_t role_reported_time;

    // 最后一次从服务器的主服务器地址变更的时间
    mstime_t slave_conf_change_time; /* Last time slave master addr changed. */

    /* Master specific. */
    /* 主服务器实例特有的属性 -------------------------------------------------------------*/

    // 其他同样监控这个主服务器的所有 sentinel
    dict *sentinels;    /* Other sentinels monitoring the same master. */

    // 如果这个实例代表的是一个主服务器
    // 那么这个字典保存着主服务器属下的从服务器
    // 字典的键是从服务器的名字，字典的值是从服务器对应的 sentinelRedisInstance 结构
    dict *slaves;       /* Slaves for this master instance. */

    // SENTINEL monitor <master-name> <IP> <port> <quorum> 选项中的 quorum 参数
    // 判断这个实例为客观下线（objectively down）所需的支持投票数量
    int quorum;         /* Number of sentinels that need to agree on failure. */

    // SENTINEL parallel-syncs <master-name> <number> 选项的值
    // 在执行故障转移操作时，可以同时对新的主服务器进行同步的从服务器数量
    int parallel_syncs; /* How many slaves to reconfigure at same time. */

    // 连接主服务器和从服务器所需的密码
    char *auth_pass;    /* Password to use for AUTH against master & slaves. */

    /* Slave specific. */
    /* 从服务器实例特有的属性 -------------------------------------------------------------*/

    // 主从服务器连接断开的时间
    mstime_t master_link_down_time; /* Slave replication link down time. */

    // 从服务器优先级
    int slave_priority; /* Slave priority according to its INFO output. */

    // 执行故障转移操作时，从服务器发送 SLAVEOF <new-master> 命令的时间
    mstime_t slave_reconf_sent_time; /* Time at which we sent SLAVE OF <new> */

    // 主服务器的实例（在本实例为从服务器时使用）
    struct sentinelRedisInstance *master; /* Master instance if it's slave. */

    // INFO 命令的回复中记录的主服务器 IP
    char *slave_master_host;    /* Master host as reported by INFO */
    
    // INFO 命令的回复中记录的主服务器端口号
    int slave_master_port;      /* Master port as reported by INFO */

    // INFO 命令的回复中记录的主从服务器连接状态
    int slave_master_link_status; /* Master link status as reported by INFO */

    // 从服务器的复制偏移量
    unsigned long long slave_repl_offset; /* Slave replication offset. */

    /* Failover */
    /* 故障转移相关属性 -------------------------------------------------------------------*/


    // 如果这是一个主服务器实例，那么 leader 将是负责进行故障转移的 Sentinel 的运行 ID 。
    // 如果这是一个 Sentinel 实例，那么 leader 就是被选举出来的领头 Sentinel 。
    // 这个域只在 Sentinel 实例的 flags 属性的 SRI_MASTER_DOWN 标志处于打开状态时才有效。
    char *leader;       /* If this is a master instance, this is the runid of
                           the Sentinel that should perform the failover. If
                           this is a Sentinel, this is the runid of the Sentinel
                           that this Sentinel voted as leader. */
    // 领头的纪元
    uint64_t leader_epoch; /* Epoch of the 'leader' field. */
    // 当前执行中的故障转移的纪元
    uint64_t failover_epoch; /* Epoch of the currently started failover. */
    // 故障转移操作的当前状态
    int failover_state; /* See SENTINEL_FAILOVER_STATE_* defines. */

    // 状态改变的时间
    mstime_t failover_state_change_time;

    // 最后一次进行故障迁移的时间
    mstime_t failover_start_time;   /* Last failover attempt start time. */

    // SENTINEL failover-timeout <master-name> <ms> 选项的值
    // 刷新故障迁移状态的最大时限
    mstime_t failover_timeout;      /* Max time to refresh failover state. */

    mstime_t failover_delay_logged; /* For what failover_start_time value we
                                       logged the failover delay. */
    // 指向被提升为新主服务器的从服务器的指针
    struct sentinelRedisInstance *promoted_slave; /* Promoted slave instance. */

    /* Scripts executed to notify admin or reconfigure clients: when they
     * are set to NULL no script is executed. */
    // 一个文件路径，保存着 WARNING 级别的事件发生时执行的，
    // 用于通知管理员的脚本的地址
    char *notification_script;

    // 一个文件路径，保存着故障转移执行之前、之后、或者被中止时，
    // 需要执行的脚本的地址
    char *client_reconfig_script;

} sentinelRedisInstance;

结构中的 sendtinelAddr 保存着对象的地址和端口.

/* Address object, used to describe an ip:port pair. */
/* 地址对象，用于保存 IP 地址和端口 */
typedef struct sentinelAddr {
    char *ip;
    int port;
} sentinelAddr;

然后再通过加载配置来进行 Sentinel 配置文件的读取

void loadServerConfig(char *filename, char *options);
调用
void loadServerConfigFromString(char *config)
中
else if (!strcasecmp(argv[0],"sentinel")) {
            /* argc == 1 is handled by main() as we need to enter the sentinel
             * mode ASAP. */
            // 如果 SENTINEL 命令不为空，那么执行以下代码
            if (argc != 1) {
                // 如果 SENTINEL 模式未开启，那么出错
                if (!server.sentinel_mode) {
                    err = "sentinel directive while not in sentinel mode";
                    goto loaderr;
                }
                // 载入 SENTINEL 相关选项
                err = sentinelHandleConfiguration(argv+1,argc-1);
                if (err) goto loaderr;
            }
        }
        

// Sentinel 配置文件分析器
char *sentinelHandleConfiguration(char **argv, int argc) {
    sentinelRedisInstance *ri;

    // SENTINEL monitor 选项
    if (!strcasecmp(argv[0],"monitor") && argc == 5) {
        /* monitor <name> <host> <port> <quorum> */

        // 读入 quorum 参数
        int quorum = atoi(argv[4]);

        // 检查 quorum 参数必须大于 0
        if (quorum <= 0) return "Quorum must be 1 or greater.";

        // 创建主服务器实例
        if (createSentinelRedisInstance(argv[1],SRI_MASTER,argv[2],
                                        atoi(argv[3]),quorum,NULL) == NULL)
        {
            switch(errno) {
            case EBUSY: return "Duplicated master name.";
            case ENOENT: return "Can't resolve master instance hostname.";
            case EINVAL: return "Invalid port number";
            }
        }

    // SENTINEL down-after-milliseconds 选项
    } else if (!strcasecmp(argv[0],"down-after-milliseconds") && argc == 3) {

        /* down-after-milliseconds <name> <milliseconds> */

        // 查找主服务器
        ri = sentinelGetMasterByName(argv[1]);
        if (!ri) return "No such master with specified name.";

        // 设置选项
        ri->down_after_period = atoi(argv[2]);
        if (ri->down_after_period <= 0)
            return "negative or zero time parameter.";

        sentinelPropagateDownAfterPeriod(ri);

    // SENTINEL failover-timeout 选项
    } else if (!strcasecmp(argv[0],"failover-timeout") && argc == 3) {

        /* failover-timeout <name> <milliseconds> */

        // 查找主服务器
        ri = sentinelGetMasterByName(argv[1]);
        if (!ri) return "No such master with specified name.";

        // 设置选项
        ri->failover_timeout = atoi(argv[2]);
        if (ri->failover_timeout <= 0)
            return "negative or zero time parameter.";

   // Sentinel parallel-syncs 选项
   } else if (!strcasecmp(argv[0],"parallel-syncs") && argc == 3) {

        /* parallel-syncs <name> <milliseconds> */

        // 查找主服务器
        ri = sentinelGetMasterByName(argv[1]);
        if (!ri) return "No such master with specified name.";

        // 设置选项
        ri->parallel_syncs = atoi(argv[2]);

    // SENTINEL notification-script 选项
   } else if (!strcasecmp(argv[0],"notification-script") && argc == 3) {

        /* notification-script <name> <path> */
        
        // 查找主服务器
        ri = sentinelGetMasterByName(argv[1]);
        if (!ri) return "No such master with specified name.";

        // 检查给定路径所指向的文件是否存在，以及是否可执行
        if (access(argv[2],X_OK) == -1)
            return "Notification script seems non existing or non executable.";

        // 设置选项
        ri->notification_script = sdsnew(argv[2]);

    // SENTINEL client-reconfig-script 选项
   } else if (!strcasecmp(argv[0],"client-reconfig-script") && argc == 3) {

        /* client-reconfig-script <name> <path> */

        // 查找主服务器
        ri = sentinelGetMasterByName(argv[1]);
        if (!ri) return "No such master with specified name.";
        // 检查给定路径所指向的文件是否存在，以及是否可执行
        if (access(argv[2],X_OK) == -1)
            return "Client reconfiguration script seems non existing or "
                   "non executable.";

        // 设置选项
        ri->client_reconfig_script = sdsnew(argv[2]);

    // 设置 SENTINEL auth-pass 选项
   } else if (!strcasecmp(argv[0],"auth-pass") && argc == 3) {

        /* auth-pass <name> <password> */

        // 查找主服务器
        ri = sentinelGetMasterByName(argv[1]);
        if (!ri) return "No such master with specified name.";

        // 设置选项
        ri->auth_pass = sdsnew(argv[2]);

    } else if (!strcasecmp(argv[0],"current-epoch") && argc == 2) {
        /* current-epoch <epoch> */
        unsigned long long current_epoch = strtoull(argv[1],NULL,10);
        if (current_epoch > sentinel.current_epoch)
            sentinel.current_epoch = current_epoch;

    // SENTINEL config-epoch 选项
    } else if (!strcasecmp(argv[0],"config-epoch") && argc == 3) {

        /* config-epoch <name> <epoch> */

        ri = sentinelGetMasterByName(argv[1]);
        if (!ri) return "No such master with specified name.";

        ri->config_epoch = strtoull(argv[2],NULL,10);
        /* The following update of current_epoch is not really useful as
         * now the current epoch is persisted on the config file, but
         * we leave this check here for redundancy. */
        if (ri->config_epoch > sentinel.current_epoch)
            sentinel.current_epoch = ri->config_epoch;

    } else if (!strcasecmp(argv[0],"leader-epoch") && argc == 3) {
        /* leader-epoch <name> <epoch> */
        ri = sentinelGetMasterByName(argv[1]);
        if (!ri) return "No such master with specified name.";
        ri->leader_epoch = strtoull(argv[2],NULL,10);

    // SENTINEL known-slave 选项
    } else if (!strcasecmp(argv[0],"known-slave") && argc == 4) {
        sentinelRedisInstance *slave;

        /* known-slave <name> <ip> <port> */

        ri = sentinelGetMasterByName(argv[1]);
        if (!ri) return "No such master with specified name.";
        if ((slave = createSentinelRedisInstance(NULL,SRI_SLAVE,argv[2],
                    atoi(argv[3]), ri->quorum, ri)) == NULL)
        {
            return "Wrong hostname or port for slave.";
        }

    // SENTINEL known-sentinel 选项
    } else if (!strcasecmp(argv[0],"known-sentinel") &&
               (argc == 4 || argc == 5)) {
        sentinelRedisInstance *si;

        /* known-sentinel <name> <ip> <port> [runid] */

        ri = sentinelGetMasterByName(argv[1]);
        if (!ri) return "No such master with specified name.";
        if ((si = createSentinelRedisInstance(NULL,SRI_SENTINEL,argv[2],
                    atoi(argv[3]), ri->quorum, ri)) == NULL)
        {
            return "Wrong hostname or port for sentinel.";
        }
        if (argc == 5) si->runid = sdsnew(argv[4]);

    } else {
        return "Unrecognized sentinel configuration statement.";
    }
    return NULL;
}

16.1.5 创建连向主服务器的网络连接

在完成上述初始化之后,sentinel服务器会开始运行,并向主服务器建立两条连接:
1): 命令连接
2): 订阅连接

调用顺序为

int serverCron(struct aeEventLoop *eventLoop, long long id, void *clientData)


/* Run the Sentinel timer if we are in sentinel mode. */
// 如果服务器运行在 sentinel 模式下，那么执行 SENTINEL 的主函数
run_with_period(100) {
    if (server.sentinel_mode) sentinelTimer();
}


// 执行定期操作
// 比如 PING 实例、分析主服务器和从服务器的 INFO 命令
// 向其他监视相同主服务器的 sentinel 发送问候信息
// 并接收其他 sentinel 发来的问候信息
// 执行故障转移操作，等等
sentinelHandleDictOfRedisInstances(sentinel.masters);

void sentinelHandleRedisInstance(sentinelRedisInstance *ri);

void sentinelReconnectInstance(sentinelRedisInstance *ri);


/* Create the async connections for the specified instance if the instance
 * is disconnected. Note that the SRI_DISCONNECTED flag is set even if just
 * one of the two links (commands and pub/sub) is missing. */
// 如果 sentinel 与实例处于断线（未连接）状态，那么创建连向实例的异步连接。
void sentinelReconnectInstance(sentinelRedisInstance *ri) {

    // 示例未断线（已连接），返回
    if (!(ri->flags & SRI_DISCONNECTED)) return;

    /* Commands connection. */
    // 对所有实例创建一个用于发送 Redis 命令的连接
    if (ri->cc == NULL) {

        // 连接实例
        ri->cc = redisAsyncConnect(ri->addr->ip,ri->addr->port);

        // 连接出错
        if (ri->cc->err) {
            sentinelEvent(REDIS_DEBUG,"-cmd-link-reconnection",ri,"%@ #%s",
                ri->cc->errstr);
            sentinelKillLink(ri,ri->cc);

        // 连接成功
        } else {
            // 设置连接属性
            ri->cc_conn_time = mstime();
            ri->cc->data = ri;
            redisAeAttach(server.el,ri->cc);
            // 设置连线 callback
            redisAsyncSetConnectCallback(ri->cc,
                                            sentinelLinkEstablishedCallback);
            // 设置断线 callback
            redisAsyncSetDisconnectCallback(ri->cc,
                                            sentinelDisconnectCallback);
            // 发送 AUTH 命令，验证身份
            sentinelSendAuthIfNeeded(ri,ri->cc);
            sentinelSetClientName(ri,ri->cc,"cmd");

            /* Send a PING ASAP when reconnecting. */
            sentinelSendPing(ri);
        }
    }

    /* Pub / Sub */
    // 对主服务器和从服务器，创建一个用于订阅频道的连接
    if ((ri->flags & (SRI_MASTER|SRI_SLAVE)) && ri->pc == NULL) {

        // 连接实例
        ri->pc = redisAsyncConnect(ri->addr->ip,ri->addr->port);

        // 连接出错
        if (ri->pc->err) {
            sentinelEvent(REDIS_DEBUG,"-pubsub-link-reconnection",ri,"%@ #%s",
                ri->pc->errstr);
            sentinelKillLink(ri,ri->pc);

        // 连接成功
        } else {
            int retval;

            // 设置连接属性
            ri->pc_conn_time = mstime();
            ri->pc->data = ri;
            redisAeAttach(server.el,ri->pc);
            // 设置连接 callback
            redisAsyncSetConnectCallback(ri->pc,
                                            sentinelLinkEstablishedCallback);
            // 设置断线 callback
            redisAsyncSetDisconnectCallback(ri->pc,
                                            sentinelDisconnectCallback);
            // 发送 AUTH 命令，验证身份
            sentinelSendAuthIfNeeded(ri,ri->pc);

            // 为客户但设置名字 "pubsub"
            sentinelSetClientName(ri,ri->pc,"pubsub");

            /* Now we subscribe to the Sentinels "Hello" channel. */
            // 发送 SUBSCRIBE __sentinel__:hello 命令，订阅频道
            retval = redisAsyncCommand(ri->pc,
                sentinelReceiveHelloMessages, NULL, "SUBSCRIBE %s",
                    SENTINEL_HELLO_CHANNEL);
            
            // 订阅出错，断开连接
            if (retval != REDIS_OK) {
                /* If we can't subscribe, the Pub/Sub connection is useless
                 * and we can simply disconnect it and try again. */
                sentinelKillLink(ri,ri->pc);
                return;
            }
        }
    }

    /* Clear the DISCONNECTED flags only if we have both the connections
     * (or just the commands connection if this is a sentinel instance). */
    // 如果实例是主服务器或者从服务器，那么当 cc 和 pc 两个连接都创建成功时，关闭 DISCONNECTED 标识
    // 如果实例是 Sentinel ，那么当 cc 连接创建成功时，关闭 DISCONNECTED 标识
    if (ri->cc && (ri->flags & SRI_SENTINEL || ri->pc))
        ri->flags &= ~SRI_DISCONNECTED;
}

16.2 获取主服务器信息

通过 info 来进行查询

// 根据情况，向实例发送 PING、 INFO 或者 PUBLISH 命令
sentinelSendPeriodicCommands(ri);

通过 INFO 命令,可以获取以下两方面的信息:
1): 一方面是关于主服务器本身的信息,包括 run_id 记录的服务器运行id,以及 role 记录的服务器角色
2): 另一方面可以获取主服务器属下所有的从服务器信息,每个从服务器都有一个 “salve” 字符串开头的行记录根据ip,prort字段,确认从服务器的地址,根据这些信息,也可以获取从服务器的地址信息,也就是可以知晓当前的服务器拓扑结构

在获取主服务器的信息之后将对当前的检测对象进行属性的更新.

// 处理 INFO 命令的回复
void sentinelInfoReplyCallback(redisAsyncContext *c, void *reply, void *privdata) {
    sentinelRedisInstance *ri = c->data;
    redisReply *r;

    if (ri) ri->pending_commands--;
    if (!reply || !ri) return;
    r = reply;

    if (r->type == REDIS_REPLY_STRING) {
        sentinelRefreshInstanceInfo(ri,r->str);
    }
}

16.3 获取从服务器信息

创建命令之后,会每10秒都会进行一个 INFO 命令发送,获取以下内容
1): 从服务器的运行ID run_id.
2): 从服务器的角色role.
3): 主服务器的IP地址 matster_host, 以及主服务器的端口号 master_port.
4): 主从服务器的连接状态 master_link_status.
5): 从服务器的优先级 slave_priority.
6): 从服务器的复制偏移量 slave_repl_offset.

用上述信息更新 sentinelRedisInstance 对象

14.4 向主服务器和从服务器发送信息

Sentinel 会以2秒的间隔进行下面信息的发送.

PUBLISH __sentinel__:hello "<s_ip>,<s_port>,<s_runid>,<s_epoch>,<m_name>,<m_ip>,<m_port>,<m_epoch>"

参数	意义
s_ip	sentinel 的地址
s_port	sentinel的端口号
s_runid	sentinel 的运行id
s_epoch	sentinel当前的配置纪元(configuration epoch)
m_name	主服务器的名字
m_ip	主服务器的IP地址
m_port	主服务器的端口号
m_epoch	主服务器当前的配置纪元


// 发送 PUBLISH 命令的间隔
#define SENTINEL_PUBLISH_PERIOD 2000

if ((now - ri->last_pub_time) > SENTINEL_PUBLISH_PERIOD) {
        /* PUBLISH hello messages to all the three kinds of instances. */
        sentinelSendHello(ri);
    }

16.5 接收来自主服务器和从服务器的频道信息

当 sentinel 与一个主服务器或者从服务器建立起订阅连接之后, sentinel 就会通过订阅连接,向服务器发送以下命令:

SUBSCRIBE __sentinel__:hello

/* Now we subscribe to the Sentinels "Hello" channel. */
// 发送 SUBSCRIBE __sentinel__:hello 命令，订阅频道
retval = redisAsyncCommand(ri->pc,
        sentinelReceiveHelloMessages, NULL, "SUBSCRIBE %s",
        SENTINEL_HELLO_CHANNEL);

当建立频道后, sentinel 既能从频道中获取信息,又能通过频道发送信息,且当频道中有其他的 sentinel 时,也能有效的进行信息交换.

16.5.1 更新 sentinel 字典

sentinel 服务器会发送以下信息至 sentinel:hello 频道,其他sentinel会接收到信息,并进行解析对 sentinels 属性进行更新,也就探知到了其他的 sentinel 服务器,所以说,sentinel之间不需要进行一个额外发信啊处理,彼此的发现是在框架中已经预设好的.

 __sentinel__:hello "<s_ip>,<s_port>,<s_runid>,<s_epoch>,<m_name>,<m_ip>,<m_port>,<m_epoch>"

// 其他同样监控这个主服务器的所有 sentinel
dict *sentinels;    /* Other sentinels monitoring the same master. */

16.5.2 创建连向其他 sentinel 的命令连接

当发现其他 sentinel 时,会在本地创建对象,然后再进行一个命令连接,所有的 sentinel 服务器,两两之间都会进行一个命令连接的创建

16.6 检测主观下线状态

sentinel 服务器会对每个监视对象以每秒一次的间隔进行一个 PING 命令的发送,通过回复来进行对象是否下面的判断,

监视的对象返回 +POING, -LODING, -MASTERDOWN 回复都视为有效回复,除此之外的情况包括没有在指定时间内进行回复都视为无效回复.

在配置文件中

down-after-milliseconds

是用来判断对象是否离线的最大时间,如果在改时间内没有进行一个回复行为,那么将对该对象的 flags 属性进行一个标记 SRI_S_DOWN,即表示对象已经进入主观下线状态

当然每个 sentinel 由于配置不同,所以当一个 sentinel 判断一个对象进入主观下线状态,但是其他 sentinel 由于配置的不同,可能还没有进入该状态,也就是对于每个 sentinel 来说,自己监视的对象是否进入主观下线状态,要视自己的配置而定.

16.7 检查客观下线状态

当自身检查到一个对象处于主观下线状态,会对其他对象进行一个询问操作,当从其他对象接收到足够的证据之后, Sentinel 会将该服务器判定为客观下线状态,并进行故障转移操作

16.7.1 发送 SENTINEL is-master-down-by-addr 命令

当 Sentinel 检测到一个服务器进入主观下线状态,会对其他 Sentinel 对象进行询问是否统一下线

SENTINEL is-master-down-by-addr <ip> <port> <current_epoch> <runid>

16.7.1 接受 SENTINEL is-master-down-by-addr 命令

当一个 Sentinel 接受到另一个 Sentinel 的询问时,会分析命令中的参数,然后对比当前数据,给出自己的判断,以一条三个参数的 Multi Bulk 作为回复

1) <down_state>
2) <leader_runid>
3) <leader_epochj>

参数	意义
down_state	1表示同意下线,0表示未下线
leader_runid	可以是符号或者目标 Sentinel的局部领头 Sentinel 的运行符号,符号表示仅仅用于检测主服务器的下线状态,而局部领头 Sentinel 的运行ID 则用于选举领头 Sentinel.
leader_epoch	目标 Sentinel 的局部领头 Sentinel 的配置纪元,用于选举领头 Sentinel,仅在 leader_runid 的值不为 * 时有效

如果另一个 Sentinel 回复

就表示同意服务器已下线

if (!strcasecmp(c->argv[1]->ptr,"is-master-down-by-addr")) {
    /* SENTINEL IS-MASTER-DOWN-BY-ADDR <ip> <port> <current-epoch> <runid>*/
    sentinelRedisInstance *ri;
    long long req_epoch;
    uint64_t leader_epoch = 0;
    char *leader = NULL;
    long port;
    int isdown = 0;

    if (c->argc != 6) goto numargserr;
    if (getLongFromObjectOrReply(c,c->argv[3],&port,NULL) != REDIS_OK ||
        getLongLongFromObjectOrReply(c,c->argv[4],&req_epoch,NULL)
                                                          != REDIS_OK)
        return;
    ri = getSentinelRedisInstanceByAddrAndRunID(sentinel.masters,
        c->argv[2]->ptr,port,NULL);

    /* It exists? Is actually a master? Is subjectively down? It's down.
     * Note: if we are in tilt mode we always reply with "0". */
    if (!sentinel.tilt && ri && (ri->flags & SRI_S_DOWN) &&
                                (ri->flags & SRI_MASTER))
        isdown = 1;

    /* Vote for the master (or fetch the previous vote) if the request
     * includes a runid, otherwise the sender is not seeking for a vote. */
    if (ri && ri->flags & SRI_MASTER && strcasecmp(c->argv[5]->ptr,"*")) {
        leader = sentinelVoteLeader(ri,(uint64_t)req_epoch,
                                        c->argv[5]->ptr,
                                        &leader_epoch);
    }

    /* Reply with a three-elements multi-bulk reply:
     * down state, leader, vote epoch. */
    // 多条回复
    // 1) <down_state>    1 代表下线， 0 代表未下线
    // 2) <leader_runid>  Sentinel 选举作为领头 Sentinel 的运行 ID
    // 3) <leader_epoch>  领头 Sentinel 目前的配置纪元
    addReplyMultiBulkLen(c,3);
    addReply(c, isdown ? shared.cone : shared.czero);
    addReplyBulkCString(c, leader ? leader : "*");
    addReplyLongLong(c, (long long)leader_epoch);
    if (leader) sdsfree(leader);
}

16.7.3 接受 SENTINEL is-master-down-by-addr 命令的回复

当收到同意下线的回复时,将对对象进行 SRI_MASTER_DOWN 的设置

/* Receive the SENTINEL is-master-down-by-addr reply, see the
 * sentinelAskMasterStateToOtherSentinels() function for more information. */
// 本回调函数用于处理SENTINEL 接收到其他 SENTINEL 
// 发回的 SENTINEL is-master-down-by-addr 命令的回复
void sentinelReceiveIsMasterDownReply(redisAsyncContext *c, void *reply, void *privdata) {
    sentinelRedisInstance *ri = c->data;
    redisReply *r;

    if (ri) ri->pending_commands--;
    if (!reply || !ri) return;
    r = reply;

    /* Ignore every error or unexpected reply.
     * 忽略错误回复
     * Note that if the command returns an error for any reason we'll
     * end clearing the SRI_MASTER_DOWN flag for timeout anyway. */
    if (r->type == REDIS_REPLY_ARRAY && r->elements == 3 &&
        r->element[0]->type == REDIS_REPLY_INTEGER &&
        r->element[1]->type == REDIS_REPLY_STRING &&
        r->element[2]->type == REDIS_REPLY_INTEGER)
    {
        // 更新最后一次回复询问的时间
        ri->last_master_down_reply_time = mstime();

        // 设置 SENTINEL 认为主服务器的状态
        if (r->element[0]->integer == 1) {
            // 已下线
            ri->flags |= SRI_MASTER_DOWN;
        } else {
            // 未下线
            ri->flags &= ~SRI_MASTER_DOWN;
        }

        // 如果运行 ID 不是 "*" 的话，那么这是一个带投票的回复
        if (strcmp(r->element[1]->str,"*")) {
            /* If the runid in the reply is not "*" the Sentinel actually
             * replied with a vote. */
            sdsfree(ri->leader);
            // 打印日志
            if (ri->leader_epoch != r->element[2]->integer)
                redisLog(REDIS_WARNING,
                    "%s voted for %s %llu", ri->name,
                    r->element[1]->str,
                    (unsigned long long) r->element[2]->integer);
            // 设置实例的领头
            ri->leader = sdsnew(r->element[1]->str);
            ri->leader_epoch = r->element[2]->integer;
        }
    }
}

当检测到主服务器对象带有 SRI_MASTER_DOWN lags 时,会进行计算判断,判断当前同样认为已经进入主观下线状态的对象有多少个,当大于指定配置时,再为主观下线状态的对象增加 SRI_O_DOWN 的flags 来标记该对象已进入客观下线状态

void sentinelCheckObjectivelyDown(sentinelRedisInstance *master) {
    dictIterator *di;
    dictEntry *de;
    int quorum = 0, odown = 0;

    // 如果当前 Sentinel 将主服务器判断为主观下线
    // 那么检查是否有其他 Sentinel 同意这一判断
    // 当同意的数量足够时，将主服务器判断为客观下线
    if (master->flags & SRI_S_DOWN) {
        /* Is down for enough sentinels? */

        // 统计同意的 Sentinel 数量（起始的 1 代表本 Sentinel）
        quorum = 1; /* the current sentinel. */

        /* Count all the other sentinels. */
        // 统计其他认为 master 进入下线状态的 Sentinel 的数量
        di = dictGetIterator(master->sentinels);
        while((de = dictNext(di)) != NULL) {
            sentinelRedisInstance *ri = dictGetVal(de);
                
            // 该 SENTINEL 也认为 master 已下线
            if (ri->flags & SRI_MASTER_DOWN) quorum++;
        }
        dictReleaseIterator(di);
        
        // 如果投票得出的支持数目大于等于判断 ODOWN 所需的票数
        // 那么进入 ODOWN 状态
        if (quorum >= master->quorum) odown = 1;
    }

    /* Set the flag accordingly to the outcome. */
    if (odown) {

        // master 已 ODOWN

        if ((master->flags & SRI_O_DOWN) == 0) {
            // 发送事件
            sentinelEvent(REDIS_WARNING,"+odown",master,"%@ #quorum %d/%d",
                quorum, master->quorum);
            // 打开 ODOWN 标志
            master->flags |= SRI_O_DOWN;
            // 记录进入 ODOWN 的时间
            master->o_down_since_time = mstime();
        }
    } else {

        // 未进入 ODOWN

        if (master->flags & SRI_O_DOWN) {

            // 如果 master 曾经进入过 ODOWN 状态，那么移除该状态

            // 发送事件
            sentinelEvent(REDIS_WARNING,"-odown",master,"%@");
            // 移除 ODOWN 标志
            master->flags &= ~SRI_O_DOWN;
        }
    }
}

16.8 选举领头 Sentinel

当一个主服务器被判定进入客观下线状态时,所有的 Sentinel 会进行协商,来选举出领头 Sentinel , 并由领头 Sentinel 进行故障转移操作,具体选择规则如下:
1): 每个在线 Sentinel 都有可能被选择为领头 Sentinel.
2): 每次进行领头选举后,都会进行 configuratio epoch 的自增.配置纪元就是个计数器,没有什么特别的
3): 在一个配置纪元里面,所有 Sentinel 都有将某个 Sentinel 设置为 Sentinel 的机会,并且局部领头一旦设置,将不能进行更改
4): 每个发现主服务器客观下线的 Sentinel 都会要求其他 Sentinel 将自己设置为领头 Sentinel
5): 当一个 Sentinel 发送 SENTINEL is-master-donw-by-addr 命令且 runid 为源 Sentinel 的运行ID 时,就是要求其他 Sentinel将自己设置为领头 Sentinel
6): Sentinel 设置局部领头的规则为先到先得,将第一个收到的命令的源 Sentinel 设置为领头 Sentinel .后接收到的命令都会拒绝
7): 源 Sentinel 在接受到回复后,会进行 runid 的比较,如果相同,则表示目标 Sentinel 将自己设置成了领头 Sentinel
8): 如果超过半数的 Sentinel 都将自己设置为了领头 Sentinel ,那么该 Sentinel 就会称为领头 Sentinel
9): 因为领头 Sentinel 需要半数以上的支持,并且每个 Sentinel 在每个配置纪元里只会配置一次领头 Sentinel, 所以在一个配置纪元里,只会出现一个领头 Sentinel
10): 如果在规定时限内,没有选举出领头 Sentinel,那么在一段事件后会进行重新选举, 直到选出领头 Sentinel 为止

16.9 故障转移

在选举出领头 Sentinel 之后,领头 Sentinel 将会对已下线的主服务器进行故障转移操作,包含下面三个步骤:
1): 在已下线的主服务器熟悉啊的所有从服务器里面,挑选出一个从服务器,将其转换为主服务器.
2): 让已下线的主服务器的其他从服务器复制新的主服务器.
3): 将已下线的主服务器设置为新的主服务器的从服务器,当这个旧的主服务器重新上线时,会成为新的主服务器的从服务器

16.9.1 选出行的主服务器

当选出领头 Sentinel 时,领头 Sentinel 会从主服务器的从服务器中选出一个服务器,向从服务器发送 SLAVE no one 命令,将从服务器转换成主服务器

选举规则为:
1): 删除列表中处于下线或断线状态的从服务器,保证所有服务器都处于在线状态
2): 删除列表中最近5秒没有回复领头 Sentinel 的服务器,保证所有服务器都是可以正常通信的
3): 删除与已下线的主服务器断开超过 down-after-milliseconds * 10 毫秒的从服务器, 用来筛选出较新数据的从数据库

当进行删减后,将优先设置优先级高,复制偏移量最大的从服务器.
如果还无法选出,将选择运行ID最小的从服务器

// 从主服务器的所有从服务器中，挑选一个作为新的主服务器
// 如果没有合格的新主服务器，那么返回 NULL
sentinelRedisInstance *sentinelSelectSlave(sentinelRedisInstance *master) {

    sentinelRedisInstance **instance =
        zmalloc(sizeof(instance[0])*dictSize(master->slaves));
    sentinelRedisInstance *selected = NULL;
    int instances = 0;
    dictIterator *di;
    dictEntry *de;
    mstime_t max_master_down_time = 0;

    // 计算可以接收的，从服务器与主服务器之间的最大下线时间
    // 这个值可以保证被选中的从服务器的数据库不会太旧
    if (master->flags & SRI_S_DOWN)
        max_master_down_time += mstime() - master->s_down_since_time;
    max_master_down_time += master->down_after_period * 10;

    // 遍历所有从服务器
    di = dictGetIterator(master->slaves);
    while((de = dictNext(di)) != NULL) {

        // 从服务器实例
        sentinelRedisInstance *slave = dictGetVal(de);
        mstime_t info_validity_time;

        // 忽略所有 SDOWN 、ODOWN 或者已断线的从服务器
        if (slave->flags & (SRI_S_DOWN|SRI_O_DOWN|SRI_DISCONNECTED)) continue;
        if (mstime() - slave->last_avail_time > SENTINEL_PING_PERIOD*5) continue;
        if (slave->slave_priority == 0) continue;

        /* If the master is in SDOWN state we get INFO for slaves every second.
         * Otherwise we get it with the usual period so we need to account for
         * a larger delay. */
        // 如果主服务器处于 SDOWN 状态，那么 Sentinel 以每秒一次的频率向从服务器发送 INFO 命令
        // 否则以平常频率向从服务器发送 INFO 命令
        // 这里要检查 INFO 命令的返回值是否合法，检查的时间会乘以一个倍数，以计算延迟
        if (master->flags & SRI_S_DOWN)
            info_validity_time = SENTINEL_PING_PERIOD*5;
        else
            info_validity_time = SENTINEL_INFO_PERIOD*3;

        // INFO 回复已过期，不考虑
        if (mstime() - slave->info_refresh > info_validity_time) continue;

        // 从服务器下线的时间过长，不考虑
        if (slave->master_link_down_time > max_master_down_time) continue;

        // 将被选中的 slave 保存到数组中
        instance[instances++] = slave;
    }
    dictReleaseIterator(di);

    if (instances) {

        // 对被选中的从服务器进行排序
        qsort(instance,instances,sizeof(sentinelRedisInstance*),
            compareSlavesForPromotion);
        
        // 分值最低的从服务器为被选中服务器
        selected = instance[0];
    }
    zfree(instance);

    // 返回被选中的从服务区
    return selected;
}

当选举出新的主服务器时, 领头 Sentinel 会以每秒一次的频率进行 INFO 命令的发送,当被升级的 role 属性从 slave 升级为了 master, 就认为已经升级成了主服务器

16.9.2 修改从服务器的复制目标

当检测到选出的从服务器已经升级到主服务器之后,将继续向其他的从服务器发送 SLAVEOF 命令,更改主服务器设置.

16.9.3 将旧主服务器变为从服务器

旧主服务器已经下线,所以设置是配置在领头 Sentinel 的旧主服务器对象中,当旧主服务器上限后, Sentinel 会进行 SLAVEOF 命令的补发

总结

1): Sentinel 只是运行在特殊模式下的 Redis 服务器,使用了与普通 Redis 不同的命令表,不同的配置,所以也能运行不同的命令
2): Sentinel 会向每个被监视的主服务器简历命令连接和订阅连接,命令连接用来传达命令,订阅连接用来接收频道消息
3): Sentinel 会向主服务器发送 INFO 命令来获取主服务器熟悉啊所有从服务器的地址信息,并未这些从服务器创建相应的对象.已经连向这些从服务器的命令连接和订阅连接
4): 在一般情况下, Sentinel 以每10秒1次的频率进行所有对象 INFO 命令查修云, 当主服务器进入客观下线装太适,会改为1秒1次.
5): 对于监视同一个服务器的多个 Sentinel 来说,会向订阅频道2秒次进行一个宣告消息,告知其他 Sentinel 自己的存在
6): 每个 Sentinel 也会从订阅频道中配置其他 Sentinel 的信息
7): Sentinel 自会与主服务器和从服务器创建命令连接和订阅连接, Sentinel和Sentinel 之间则只创建命令连接
8): Sentinel 每秒向所有对象进行 PING 命令,并根据回复来判断兑现是否在线,当一个对象没有回复正确内容时,会被判断进入了主观下线状态
9): 当判断一个主服务器进入主观下线状态后,会向其他 Sentinel 进行询问是否同意主服务器进入了主观下线状态.
10): 当判断主服务器进入主观下线状态的数量达到配置值时,会进行 Sentinel 的选举,并由选举出来的领头 Sentinel 进行故障转移操作

xinyYoung

发布了112 篇原创文章 · 获赞 3 · 访问量 1万+

私信关注