Redis Sentinel是Redis的高可用方案。是Redis 2.8中正式引入的。
在之前的主从复制方案中,如果主节点出现问题,需要手动将一个从节点升级为主节点,然后将其它从节点指向新的主节点,并且需要修改应用方主节点的地址。整个过程都需要人工干预。
下面通过日志具体看看Sentinel的切换流程。
Sentinel的切换流程
集群拓扑图如下。
角色 IP 端口 runID
主节点 127.0.0.1 6379
从节点-1 127.0.0.1 6380
从节点-2 127.0.0.1 6381
Sentinel-1 127.0.0.1 26379 d4424b8684977767be4f5abd1e364153fbb0adbd
Sentinel-2 127.0.0.1 26380 18311edfbfb7bf89fe4b67d08ef432053db62fff
Sentinel-3 127.0.0.1 26381 3e9eb1aa9378d89cfe04fe21bf4a05a901747fa8
kill -9 将主节点进程杀死。
1. 最先反应的是从节点。
其会马上输出如下信息。
28244:S 08 Oct 16:03:34.184 # Connection with master lost. 28244:S 08 Oct 16:03:34.184 * Caching the disconnected master state. 28244:S 08 Oct 16:03:34.548 * Connecting to MASTER 127.0.0.1:6379 28244:S 08 Oct 16:03:34.548 * MASTER <-> SLAVE sync started 28244:S 08 Oct 16:03:34.548 # Error condition on socket for SYNC: Connection refused 28244:S 08 Oct 16:03:35.556 * Connecting to MASTER 127.0.0.1:6379 28244:S 08 Oct 16:03:35.556 * MASTER <-> SLAVE sync started ...
2. Sentinel的日志30s后才有输出,这个与“sentinel down-after-milliseconds mymaster 30000”的设置有关。
下面,依次贴出哨兵各个节点及slave的日志输出。
Sentinel-1
28087:X 08 Oct 16:04:04.277 # +sdown master mymaster 127.0.0.1 6379 28087:X 08 Oct 16:04:04.379 # +new-epoch 1 28087:X 08 Oct 16:04:04.385 # +vote-for-leader 18311edfbfb7bf89fe4b67d08ef432053db62fff 1 28087:X 08 Oct 16:04:05.388 # +odown master mymaster 127.0.0.1 6379 #quorum 3/2 28087:X 08 Oct 16:04:05.388 # Next failover delay: I will not start a failover before Mon Oct 8 16:10:04 2018 28087:X 08 Oct 16:04:05.631 # +config-update-from sentinel 18311edfbfb7bf89fe4b67d08ef432053db62fff 127.0.0.1 26380 @ mymaster 127.0.0.1 6379 28087:X 08 Oct 16:04:05.631 # +switch-master mymaster 127.0.0.1 6379 127.0.0.1 6381 28087:X 08 Oct 16:04:05.631 * +slave slave 127.0.0.1:6380 127.0.0.1 6380 @ mymaster 127.0.0.1 6381 28087:X 08 Oct 16:04:05.631 * +slave slave 127.0.0.1:6379 127.0.0.1 6379 @ mymaster 127.0.0.1 6381 28087:X 08 Oct 16:04:35.656 # +sdown slave 127.0.0.1:6379 127.0.0.1 6379 @ mymaster 127.0.0.1 6381
Sentinel-2
28163:X 08 Oct 16:04:04.289 # +sdown master mymaster 127.0.0.1 6379 28163:X 08 Oct 16:04:04.366 # +odown master mymaster 127.0.0.1 6379 #quorum 3/2 28163:X 08 Oct 16:04:04.366 # +new-epoch 1 28163:X 08 Oct 16:04:04.366 # +try-failover master mymaster 127.0.0.1 6379 28163:X 08 Oct 16:04:04.373 # +vote-for-leader 18311edfbfb7bf89fe4b67d08ef432053db62fff 1 28163:X 08 Oct 16:04:04.385 # 3e9eb1aa9378d89cfe04fe21bf4a05a901747fa8 voted for 18311edfbfb7bf89fe4b67d08ef432053db62fff 1 28163:X 08 Oct 16:04:04.385 # d4424b8684977767be4f5abd1e364153fbb0adbd voted for 18311edfbfb7bf89fe4b67d08ef432053db62fff 1 28163:X 08 Oct 16:04:04.450 # +elected-leader master mymaster 127.0.0.1 6379 28163:X 08 Oct 16:04:04.450 # +failover-state-select-slave master mymaster 127.0.0.1 6379 28163:X 08 Oct 16:04:04.528 # +selected-slave slave 127.0.0.1:6381 127.0.0.1 6381 @ mymaster 127.0.0.1 6379 28163:X 08 Oct 16:04:04.528 * +failover-state-send-slaveof-noone slave 127.0.0.1:6381 127.0.0.1 6381 @ mymaster 127.0.0.1 6379 28163:X 08 Oct 16:04:04.586 * +failover-state-wait-promotion slave 127.0.0.1:6381 127.0.0.1 6381 @ mymaster 127.0.0.1 6379 28163:X 08 Oct 16:04:05.543 # +promoted-slave slave 127.0.0.1:6381 127.0.0.1 6381 @ mymaster 127.0.0.1 6379 28163:X 08 Oct 16:04:05.543 # +failover-state-reconf-slaves master mymaster 127.0.0.1 6379 28163:X 08 Oct 16:04:05.629 * +slave-reconf-sent slave 127.0.0.1:6380 127.0.0.1 6380 @ mymaster 127.0.0.1 6379 28163:X 08 Oct 16:04:06.554 # -odown master mymaster 127.0.0.1 6379 28163:X 08 Oct 16:04:06.555 * +slave-reconf-inprog slave 127.0.0.1:6380 127.0.0.1 6380 @ mymaster 127.0.0.1 6379 28163:X 08 Oct 16:04:06.555 * +slave-reconf-done slave 127.0.0.1:6380 127.0.0.1 6380 @ mymaster 127.0.0.1 6379 28163:X 08 Oct 16:04:06.606 # +failover-end master mymaster 127.0.0.1 6379 28163:X 08 Oct 16:04:06.606 # +switch-master mymaster 127.0.0.1 6379 127.0.0.1 6381 28163:X 08 Oct 16:04:06.606 * +slave slave 127.0.0.1:6380 127.0.0.1 6380 @ mymaster 127.0.0.1 6381 28163:X 08 Oct 16:04:06.606 * +slave slave 127.0.0.1:6379 127.0.0.1 6379 @ mymaster 127.0.0.1 6381 28163:X 08 Oct 16:04:36.687 # +sdown slave 127.0.0.1:6379 127.0.0.1 6379 @ mymaster 127.0.0.1 6381
Sentinel-3
28234:X 08 Oct 16:04:04.288 # +sdown master mymaster 127.0.0.1 6379 28234:X 08 Oct 16:04:04.378 # +new-epoch 1 28234:X 08 Oct 16:04:04.385 # +vote-for-leader 18311edfbfb7bf89fe4b67d08ef432053db62fff 1 28234:X 08 Oct 16:04:04.385 # +odown master mymaster 127.0.0.1 6379 #quorum 2/2 28234:X 08 Oct 16:04:04.385 # Next failover delay: I will not start a failover before Mon Oct 8 16:10:04 2018 28234:X 08 Oct 16:04:05.630 # +config-update-from sentinel 18311edfbfb7bf89fe4b67d08ef432053db62fff 127.0.0.1 26380 @ mymaster 127.0.0.1 6379 28234:X 08 Oct 16:04:05.630 # +switch-master mymaster 127.0.0.1 6379 127.0.0.1 6381 28234:X 08 Oct 16:04:05.630 * +slave slave 127.0.0.1:6380 127.0.0.1 6380 @ mymaster 127.0.0.1 6381 28234:X 08 Oct 16:04:05.630 * +slave slave 127.0.0.1:6379 127.0.0.1 6379 @ mymaster 127.0.0.1 6381 28234:X 08 Oct 16:04:35.709 # +sdown slave 127.0.0.1:6379 127.0.0.1 6379 @ mymaster 127.0.0.1 6381
slave2
28244:S 08 Oct 16:04:04.762 * MASTER <-> SLAVE sync started 28244:S 08 Oct 16:04:04.762 # Error condition on socket for SYNC: Connection refused 28244:S 08 Oct 16:04:05.630 * SLAVE OF 127.0.0.1:6381 enabled (user request from 'id=6 addr=127.0.0.1:43880 fd=12 name= age=148 idle=0 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=224 qbuf-free= 32544 obl=81 oll=0 omem=0 events=r cmd=slaveof')28244:S 08 Oct 16:04:05.636 # CONFIG REWRITE executed with success. 28244:S 08 Oct 16:04:05.770 * Connecting to MASTER 127.0.0.1:6381 28244:S 08 Oct 16:04:05.770 * MASTER <-> SLAVE sync started 28244:S 08 Oct 16:04:05.770 * Non blocking connect for SYNC fired the event. 28244:S 08 Oct 16:04:05.770 * Master replied to PING, replication can continue... 28244:S 08 Oct 16:04:05.770 * Trying a partial resynchronization (request b95802ca8afd97c578b355a5838d219681d0af27:24302). 28244:S 08 Oct 16:04:05.770 * Successful partial resynchronization with master. 28244:S 08 Oct 16:04:05.770 # Master replication ID changed to a4022bb5c361353a4773fd460cec5cdcc5c02031 28244:S 08 Oct 16:04:05.770 * MASTER <-> SLAVE sync: Master accepted a Partial Resynchronization.
slave3
28253:S 08 Oct 16:04:03.655 * MASTER <-> SLAVE sync started 28253:S 08 Oct 16:04:03.655 # Error condition on socket for SYNC: Connection refused 28253:M 08 Oct 16:04:04.586 # Setting secondary replication ID to b95802ca8afd97c578b355a5838d219681d0af27, valid up to offset: 24302. New replication ID is a4022bb5c361353a4773fd460cec5cdc c5c0203128253:M 08 Oct 16:04:04.586 * Discarding previously cached master state. 28253:M 08 Oct 16:04:04.586 * MASTER MODE enabled (user request from 'id=9 addr=127.0.0.1:49316 fd=8 name=sentinel-18311edf-cmd age=137 idle=0 flags=x db=0 sub=0 psub=0 multi=3 qbuf=0 qbuf- free=32768 obl=36 oll=0 omem=0 events=r cmd=exec')28253:M 08 Oct 16:04:04.593 # CONFIG REWRITE executed with success. 28253:M 08 Oct 16:04:05.770 * Slave 127.0.0.1:6380 asks for synchronization 28253:M 08 Oct 16:04:05.770 * Partial resynchronization request from 127.0.0.1:6380 accepted. Sending 156 bytes of backlog starting from offset 24302.
结合上面的日志,可以看到,
各个Sentinel节点都判断127.0.0.1 6379为主观下线(Subjectively Down,缩写为sdown)。
28163:X 08 Oct 16:04:04.289 # +sdown master mymaster 127.0.0.1 6379
达到quorum的设置,Sentinel-2判断其为客观下线( ively Down,缩写为odown)。结合其它两个Sentinel节点的日志,可以看到,Sentinel-2最先判定其客观下线。接下来,会进行Sentinel的领导者选举。一般来说,谁先完成客观下线的判定,谁就是领导者,只有Sentinel领导者才能进行failover。
28163:X 08 Oct 16:04:04.366 # +odown master mymaster 127.0.0.1 6379 #quorum 3/2 28163:X 08 Oct 16:04:04.366 # +new-epoch 1 28163:X 08 Oct 16:04:04.366 # +try-failover master mymaster 127.0.0.1 6379 28163:X 08 Oct 16:04:04.373 # +vote-for-leader 18311edfbfb7bf89fe4b67d08ef432053db62fff 1 28163:X 08 Oct 16:04:04.385 # 3e9eb1aa9378d89cfe04fe21bf4a05a901747fa8 voted for 18311edfbfb7bf89fe4b67d08ef432053db62fff 1 28163:X 08 Oct 16:04:04.385 # d4424b8684977767be4f5abd1e364153fbb0adbd voted for 18311edfbfb7bf89fe4b67d08ef432053db62fff 1 28163:X 08 Oct 16:04:04.450 # +elected-leader master mymaster 127.0.0.1 6379
寻找合适的slave作为master
28163:X 08 Oct 16:04:04.450 # +failover-state-select-slave master mymaster 127.0.0.1 6379
+failover-state-select-slave <instance details> -- New failover state is select-slave: we are trying to find a suitable slave for promotion.
将127.0.0.1 6381设置为新主
28163:X 08 Oct 16:04:04.528 # +selected-slave slave 127.0.0.1:6381 127.0.0.1 6381 @ mymaster 127.0.0.1 6379
+selected-slave <instance details> -- We found the specified good slave to promote.
命令6381节点执行slaveof no one,使其成为主节点
28163:X 08 Oct 16:04:04.528 * +failover-state-send-slaveof-noone slave 127.0.0.1:6381 127.0.0.1 6381 @ mymaster 127.0.0.1 6379
+failover-state-send-slaveof-noone <instance details> -- We are trying to reconfigure the promoted slave as master, waiting for it to switch.
等待6381节点升级为主节点
28163:X 08 Oct 16:04:04.586 * +failover-state-wait-promotion slave 127.0.0.1:6381 127.0.0.1 6381 @ mymaster 127.0.0.1 6379
确认6381节点已经升级为主节点
28163:X 08 Oct 16:04:05.543 # +promoted-slave slave 127.0.0.1:6381 127.0.0.1 6381 @ mymaster 127.0.0.1 6379
再来看看16:04:04.528到16:04:05.543这个时间段slave3的日志输出。可以看到,其开启了MASTER模式,且重写了配置文件。
28253:M 08 Oct 16:04:04.586 # Setting secondary replication ID to b95802ca8afd97c578b355a5838d219681d0af27, valid up to offset: 24302. New replication ID is a4022bb5c361353a4773fd460cec5cdcc5c02031 28253:M 08 Oct 16:04:04.586 * Discarding previously cached master state. 28253:M 08 Oct 16:04:04.586 * MASTER MODE enabled (user request from 'id=9 addr=127.0.0.1:49316 fd=8 name=sentinel-18311edf-cmd age=137 idle=0 flags=x db=0 sub=0 psub=0 multi=3 qbuf=0 qbuf-free=32768 obl=36 oll=0 omem=0 events=r cmd=exec') 28253:M 08 Oct 16:04:04.593 # CONFIG REWRITE executed with success.
failover进入重新配置从节点阶段
28163:X 08 Oct 16:04:05.543 # +failover-state-reconf-slaves master mymaster 127.0.0.1 6379
命令6380节点复制新的主节点
28163:X 08 Oct 16:04:05.629 * +slave-reconf-sent slave 127.0.0.1:6380 127.0.0.1 6380 @ mymaster 127.0.0.1 6379
+slave-reconf-sent <instance details> -- The leader sentinel sent the SLAVEOF command to this instance in order to reconfigure it for the new slave.
看看这个时间点slave2的日志输出,基本吻合。其进行的是增量同步。
28244:S 08 Oct 16:04:05.630 * SLAVE OF 127.0.0.1:6381 enabled (user request from 'id=6 addr=127.0.0.1:43880 fd=12 name= age=148 idle=0 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=224 qbuf-free=32544 obl=81 oll=0 omem=0 events=r cmd=slaveof') 28244:S 08 Oct 16:04:05.636 # CONFIG REWRITE executed with success. 28244:S 08 Oct 16:04:05.770 * Connecting to MASTER 127.0.0.1:6381 28244:S 08 Oct 16:04:05.770 * MASTER <-> SLAVE sync started 28244:S 08 Oct 16:04:05.770 * Non blocking connect for SYNC fired the event. 28244:S 08 Oct 16:04:05.770 * Master replied to PING, replication can continue... 28244:S 08 Oct 16:04:05.770 * Trying a partial resynchronization (request b95802ca8afd97c578b355a5838d219681d0af27:24302). 28244:S 08 Oct 16:04:05.770 * Successful partial resynchronization with master. 28244:S 08 Oct 16:04:05.770 # Master replication ID changed to a4022bb5c361353a4773fd460cec5cdcc5c02031 28244:S 08 Oct 16:04:05.770 * MASTER <-> SLAVE sync: Master accepted a Partial Resynchronization.
同时,在这个时间点,sentinel也有日志输出,以sentinel1为例。从日志中,可以看到,在这个时间点它会更改配置信息。
28087:X 08 Oct 16:04:05.631 # +config-update-from sentinel 18311edfbfb7bf89fe4b67d08ef432053db62fff 127.0.0.1 26380 @ mymaster 127.0.0.1 6379 28087:X 08 Oct 16:04:05.631 # +switch-master mymaster 127.0.0.1 6379 127.0.0.1 6381 28087:X 08 Oct 16:04:05.631 * +slave slave 127.0.0.1:6380 127.0.0.1 6380 @ mymaster 127.0.0.1 6381 28087:X 08 Oct 16:04:05.631 * +slave slave 127.0.0.1:6379 127.0.0.1 6379 @ mymaster 127.0.0.1 6381
switch-master <master name> <oldip> <oldport> <newip> <newport> -- The master new IP and address is the specified one after a configuration change. This is the message most external users are interested in.
同步过程尚未完成。
28163:X 08 Oct 16:04:06.555 * +slave-reconf-inprog slave 127.0.0.1:6380 127.0.0.1 6380 @ mymaster 127.0.0.1 6379
+slave-reconf-inprog <instance details> -- The slave being reconfigured showed to be a slave of the new master ip:port pair, but the synchronization process is not yet complete.
主从同步完成。
28163:X 08 Oct 16:04:06.555 * +slave-reconf-done slave 127.0.0.1:6380 127.0.0.1 6380 @ mymaster 127.0.0.1 6379
+slave-reconf-done <instance details> -- The slave is now synchronized with the new master.
failover切换完成。
28163:X 08 Oct 16:04:06.606 # +failover-end master mymaster 127.0.0.1 6379
failover成功后,发布主节点的切换消息
28163:X 08 Oct 16:04:06.606 # +switch-master mymaster 127.0.0.1 6379 127.0.0.1 6381
关联新主节点的slave信息,需要注意的是,原来的主节点会作为新主节点的slave。
28163:X 08 Oct 16:04:06.606 * +slave slave 127.0.0.1:6380 127.0.0.1 6380 @ mymaster 127.0.0.1 6381 28163:X 08 Oct 16:04:06.606 * +slave slave 127.0.0.1:6379 127.0.0.1 6379 @ mymaster 127.0.0.1 6381
+slave <instance details> -- A new slave was detected and attached.
过了30s后,判定原来的主节点主观下线。
28163:X 08 Oct 16:04:36.687 # +sdown slave 127.0.0.1:6379 127.0.0.1 6379 @ mymaster 127.0.0.1 6381
综合来看,Sentinel进行failover的流程如下
1. 每隔1秒,每个Sentinel节点会向主节点、从节点、其余Sentinel节点发送一条ping命令做一次心跳检测,来确认这些节点当前是否可达。当这些节点超过down-after-milliseconds没有进行有效回复,Sentinel节点就会判定该节点为主观下线。
2. 如果被判定为主观下线的节点是主节点,该Sentinel节点会通过sentinel is master-down-by-addr命令向其他Sentinel节点询问对主节点的判断,当超过<quorum>个数,Sentinel节点会判定该节点为客观下线。如果从节点、Sentinel节点被判定为主观下线,并不会进行后续的故障切换操作。
3. 对Sentinel进行领导者选举,由其来进行后续的故障切换(failover)工作。选举算法基于Raft。
4. Sentinel领导者节点开始进行故障切换。
5. 选择合适的从节点作为新主节点。
6. Sentinel领导者节点对上一步选出来的从节点执行slaveof no one命令让其成为主节点。
7. 向剩余的从节点发送命令,让它们成为新主节点的从节点,复制规则和parallel-syncs参数有关。
8. 将原来的主节点更新为从节点,并将其纳入到Sentinel的管理,让其恢复后去复制新的主节点。
Sentinel的领导者选举流程。
Sentinel的领导者选举基于Raft协议。
1. 每个在线的Sentinel节点都有资格成为领导者,当它确认主节点主观下线时候,会向其他Sentinel节点发送sentinel is-master-down-by-addr命令,要求将自己设置为领导者。
2. 收到命令的Sentinel节点,如果没有同意过其他Sentinel节点的sentinel is-master-down-by-addr命令,将同意该请求,否则拒绝。
3. 如果该Sentinel节点发现自己的票数已经大于等于max(quorum,num(sentinels)/2+1),那么它将成为领导者。
新主节点的选择流程。
1. 删除所有已经处于下线或断线状态的从节点。
2. 删除最近5秒没有回复过领导者Sentinel的INFO命令的从节点。
3. 删除所有与已下线主节点连接断开超过down-after-milliseconds*10毫秒的从节点。
4. 选择优先级最高的从节点。
5. 选择复制偏移量最大的从节点。
6. 选择runid最小的从节点。
三个定时监控任务
1. 每隔10秒,每个Sentinel节点会向主节点和从节点发送info命令获取最新的拓扑结构。其作用如下:
1> 通过向主节点执行info命令,获取从节点的信息,这也是为什么Sentinel节点不需要显式配置监控从节点。
2> 当有新的从节点加入时可立刻感知出来。
3> 节点不可达或者故障切换后,可通过info命令实时更新节点拓扑信息。
2. 每隔2秒,每个Sentinel节点会向Redis数据节点的__sentinel__:hello频道上发送该Sentinel节点对于主节点的判断以及当前Sentinel节点的信息,同时每个Sentinel节点也会订阅该频道,来了解其它Sentinel节点以及它们对主节点的判断。其作用如下:
1> 发现新的Sentinel节点:通过订阅主节点的__sentinel__:hello了解其它Sentinel节点信息,如果是新加入的Sentinel节点,将该Sentinel节点信息保存起来,并与该Sentinel节点创建连接。
2> Sentinel节点之间交换主节点的状态,作为后面客观下线以及领导者选举的依据。
3. 每隔1秒,每个Sentinel节点会向主节点、从节点、其余Sentinel节点发送一条ping命令做一次心跳检测,来确认这些节点当前是否可达。这个定时任务是节点失败判定的重要依据。
Sentinel的相关参数
# bind 127.0.0.1 192.168.1.1 # protected-mode no port 26379 # sentinel announce-ip <ip> # sentinel announce-port <port> dir /tmp sentinel monitor mymaster 127.0.0.1 6379 2 # sentinel auth-pass <master-name> <password> sentinel down-after-milliseconds mymaster 30000 sentinel parallel-syncs mymaster 1 sentinel failover-timeout mymaster 180000 # sentinel notification- mymaster /var/redis/notify.sh # sentinel client-reconfig- mymaster /var/redis/reconfig.sh sentinel deny- s-reconfig yes
其中,
dir:设置Sentinel的工作目录。
sentinel monitor mymaster 127.0.0.1 6379 2:其中2是quorum,即权重,代表至少需要两个Sentinel节点认为主节点主观下线,才可判定主节点为客观下线。一般建议将其设置为Sentinel节点的一半加1。不仅如此,quorum还与Sentinel节点的领导者选举有关。为了选出Sentinel的领导者,至少需要max(quorum, num(sentinels) / 2 + 1)个Sentinel节点参与选举。
sentinel down-after-milliseconds mymaster 30000:每个Sentinel节点都要通过定期发送ping命令来判断Redis节点和其余Sentinel节点是否可达。
如果在指定的时间内,没有收到主节点的有效回复,则判断其为主观下线。需要注意的是,该参数不仅用来判断主节点状态,同样也用来判断该主节点下面的从节点及其它Sentinel的状态。其默认值为30s。
sentinel parallel-syncs mymaster 1:在failover期间,允许多少个slave同时指向新的主节点。如果numslaves设置较大的话,虽然复制操作并不会阻塞主节点,但多个节点同时指向新的主节点,会增加主节点的网络和磁盘IO负载。
sentinel failover-timeout mymaster 180000:定义故障切换超时时间。默认180000,单位秒,即3min。需要注意的是,该时间不是总的故障切换的时间,而是适用于故障切换的多个场景。
# Specifies the failover timeout in milliseconds. It is used in many ways: # # - The time needed to re-start a failover after a previous failover was # already tried against the same master by a given Sentinel, is two # times the failover timeout. # # - The time needed for a slave replicating to a wrong master according # to a Sentinel current configuration, to be forced to replicate # with the right master, is exactly the failover timeout (counting since # the moment a Sentinel detected the misconfiguration). # # - The time needed to cancel a failover that is already in progress but # did not produced any configuration change (SLAVEOF NO ONE yet not # acknowledged by the promoted slave). # # - The maximum time a failover in progress waits for all the slaves to be # reconfigured as slaves of the new master. However even after this time # the slaves will be reconfigured by the Sentinels anyway, but not with # the exact parallel-syncs progression as specified.
第一种适用场景:如果Redis Sentinel对一个主节点故障切换失败,那么下次再对该主节点做故障切换的起始时间是failover-timeout的2倍。这点从Sentinel的日志就可体现出来(28234:X 08 Oct 16:04:04.385 # Next failover delay: I will not start a failover before Mon Oct 8 16:10:04 2018)
sentinel notification- :定义通知脚本,当Sentinel出现WARNING级别的事件时,会调用该脚本,其会传入两个参数:事件类型,事件描述。
sentinel client-reconfig- :当主节点发生切换时,会调用该参数定义的脚本,其会传入以下参数:<master-name> <role> <state> <from-ip> <from-port> <to-ip> <to-port>
关于脚本,其必须遵循一定的规则。
# S EXECUTION # # sentinel notification- and sentinel reconfig- are used in order # to configure s that are called to notify the system administrator # or to reconfigure clients after a failover. The s are executed # with the following rules for error handling: # # If exits with "1" the execution is retried later (up to a maximum # number of times currently set to 10). # # If exits with "2" (or an higher value) the execution is # not retried. # # If terminates because it receives a signal the behavior is the same # as exit code 1. # # A has a maximum running time of 60 seconds. After this limi
继续阅读与本文标签相同的文章
MySQL高级教程
库克回母校发表演讲:一堂苹果的积极政治课
-
数十万共享雨伞不翼而飞,创始人却高兴的要命!网友:赚翻了
2026-05-18栏目: 教程
-
滴滴 这是一见钟情的感脚
2026-05-18栏目: 教程
-
以实践的方式讨论:N-Gram原理与其应用
2026-05-18栏目: 教程
-
Hi拼团,第六代云服务器拼团购买更便宜,低至148元/年
2026-05-18栏目: 教程
-
汇编(五)栈、CPU提供的栈机制、push、pop指令
2026-05-18栏目: 教程
