Redis Sentinel是Redis的高可用方案。是Redis 2.8中正式引入的。

在之前的主从复制方案中,如果主节点出现问题,需要手动将一个从节点升级为主节点,然后将其它从节点指向新的主节点,并且需要修改应用方主节点的地址。整个过程都需要人工干预。

 

下面通过日志具体看看Sentinel的切换流程。

 

Sentinel的切换流程

集群拓扑图如下。

角色                 IP              端口           runID

主节点             127.0.0.1   6379

从节点-1          127.0.0.1   6380 

从节点-2          127.0.0.1   6381

Sentinel-1        127.0.0.1   26379    d4424b8684977767be4f5abd1e364153fbb0adbd

Sentinel-2        127.0.0.1   26380    18311edfbfb7bf89fe4b67d08ef432053db62fff

Sentinel-3        127.0.0.1   26381    3e9eb1aa9378d89cfe04fe21bf4a05a901747fa8

 

kill -9 将主节点进程杀死。

1. 最先反应的是从节点。

其会马上输出如下信息。

28244:S 08 Oct 16:03:34.184 # Connection with master lost.
28244:S 08 Oct 16:03:34.184 * Caching the disconnected master state.
28244:S 08 Oct 16:03:34.548 * Connecting to MASTER 127.0.0.1:6379
28244:S 08 Oct 16:03:34.548 * MASTER <-> SLAVE sync started
28244:S 08 Oct 16:03:34.548 # Error condition on socket for SYNC: Connection refused
28244:S 08 Oct 16:03:35.556 * Connecting to MASTER 127.0.0.1:6379
28244:S 08 Oct 16:03:35.556 * MASTER <-> SLAVE sync started
...

2. Sentinel的日志30s后才有输出,这个与“sentinel down-after-milliseconds mymaster 30000”的设置有关。

下面,依次贴出哨兵各个节点及slave的日志输出。

Sentinel-1

28087:X 08 Oct 16:04:04.277 # +sdown master mymaster 127.0.0.1 6379
28087:X 08 Oct 16:04:04.379 # +new-epoch 1
28087:X 08 Oct 16:04:04.385 # +vote-for-leader 18311edfbfb7bf89fe4b67d08ef432053db62fff 1
28087:X 08 Oct 16:04:05.388 # +odown master mymaster 127.0.0.1 6379 #quorum 3/2
28087:X 08 Oct 16:04:05.388 # Next failover delay: I will not start a failover before Mon Oct  8 16:10:04 2018
28087:X 08 Oct 16:04:05.631 # +config-update-from sentinel 18311edfbfb7bf89fe4b67d08ef432053db62fff 127.0.0.1 26380 @ mymaster 127.0.0.1 6379
28087:X 08 Oct 16:04:05.631 # +switch-master mymaster 127.0.0.1 6379 127.0.0.1 6381
28087:X 08 Oct 16:04:05.631 * +slave slave 127.0.0.1:6380 127.0.0.1 6380 @ mymaster 127.0.0.1 6381
28087:X 08 Oct 16:04:05.631 * +slave slave 127.0.0.1:6379 127.0.0.1 6379 @ mymaster 127.0.0.1 6381
28087:X 08 Oct 16:04:35.656 # +sdown slave 127.0.0.1:6379 127.0.0.1 6379 @ mymaster 127.0.0.1 6381

Sentinel-2

28163:X 08 Oct 16:04:04.289 # +sdown master mymaster 127.0.0.1 6379
28163:X 08 Oct 16:04:04.366 # +odown master mymaster 127.0.0.1 6379 #quorum 3/2
28163:X 08 Oct 16:04:04.366 # +new-epoch 1
28163:X 08 Oct 16:04:04.366 # +try-failover master mymaster 127.0.0.1 6379
28163:X 08 Oct 16:04:04.373 # +vote-for-leader 18311edfbfb7bf89fe4b67d08ef432053db62fff 1
28163:X 08 Oct 16:04:04.385 # 3e9eb1aa9378d89cfe04fe21bf4a05a901747fa8 voted for 18311edfbfb7bf89fe4b67d08ef432053db62fff 1
28163:X 08 Oct 16:04:04.385 # d4424b8684977767be4f5abd1e364153fbb0adbd voted for 18311edfbfb7bf89fe4b67d08ef432053db62fff 1
28163:X 08 Oct 16:04:04.450 # +elected-leader master mymaster 127.0.0.1 6379
28163:X 08 Oct 16:04:04.450 # +failover-state-select-slave master mymaster 127.0.0.1 6379
28163:X 08 Oct 16:04:04.528 # +selected-slave slave 127.0.0.1:6381 127.0.0.1 6381 @ mymaster 127.0.0.1 6379
28163:X 08 Oct 16:04:04.528 * +failover-state-send-slaveof-noone slave 127.0.0.1:6381 127.0.0.1 6381 @ mymaster 127.0.0.1 6379
28163:X 08 Oct 16:04:04.586 * +failover-state-wait-promotion slave 127.0.0.1:6381 127.0.0.1 6381 @ mymaster 127.0.0.1 6379
28163:X 08 Oct 16:04:05.543 # +promoted-slave slave 127.0.0.1:6381 127.0.0.1 6381 @ mymaster 127.0.0.1 6379
28163:X 08 Oct 16:04:05.543 # +failover-state-reconf-slaves master mymaster 127.0.0.1 6379
28163:X 08 Oct 16:04:05.629 * +slave-reconf-sent slave 127.0.0.1:6380 127.0.0.1 6380 @ mymaster 127.0.0.1 6379
28163:X 08 Oct 16:04:06.554 # -odown master mymaster 127.0.0.1 6379
28163:X 08 Oct 16:04:06.555 * +slave-reconf-inprog slave 127.0.0.1:6380 127.0.0.1 6380 @ mymaster 127.0.0.1 6379
28163:X 08 Oct 16:04:06.555 * +slave-reconf-done slave 127.0.0.1:6380 127.0.0.1 6380 @ mymaster 127.0.0.1 6379
28163:X 08 Oct 16:04:06.606 # +failover-end master mymaster 127.0.0.1 6379
28163:X 08 Oct 16:04:06.606 # +switch-master mymaster 127.0.0.1 6379 127.0.0.1 6381
28163:X 08 Oct 16:04:06.606 * +slave slave 127.0.0.1:6380 127.0.0.1 6380 @ mymaster 127.0.0.1 6381
28163:X 08 Oct 16:04:06.606 * +slave slave 127.0.0.1:6379 127.0.0.1 6379 @ mymaster 127.0.0.1 6381
28163:X 08 Oct 16:04:36.687 # +sdown slave 127.0.0.1:6379 127.0.0.1 6379 @ mymaster 127.0.0.1 6381

Sentinel-3

28234:X 08 Oct 16:04:04.288 # +sdown master mymaster 127.0.0.1 6379
28234:X 08 Oct 16:04:04.378 # +new-epoch 1
28234:X 08 Oct 16:04:04.385 # +vote-for-leader 18311edfbfb7bf89fe4b67d08ef432053db62fff 1
28234:X 08 Oct 16:04:04.385 # +odown master mymaster 127.0.0.1 6379 #quorum 2/2
28234:X 08 Oct 16:04:04.385 # Next failover delay: I will not start a failover before Mon Oct  8 16:10:04 2018
28234:X 08 Oct 16:04:05.630 # +config-update-from sentinel 18311edfbfb7bf89fe4b67d08ef432053db62fff 127.0.0.1 26380 @ mymaster 127.0.0.1 6379
28234:X 08 Oct 16:04:05.630 # +switch-master mymaster 127.0.0.1 6379 127.0.0.1 6381
28234:X 08 Oct 16:04:05.630 * +slave slave 127.0.0.1:6380 127.0.0.1 6380 @ mymaster 127.0.0.1 6381
28234:X 08 Oct 16:04:05.630 * +slave slave 127.0.0.1:6379 127.0.0.1 6379 @ mymaster 127.0.0.1 6381
28234:X 08 Oct 16:04:35.709 # +sdown slave 127.0.0.1:6379 127.0.0.1 6379 @ mymaster 127.0.0.1 6381

 

slave2

28244:S 08 Oct 16:04:04.762 * MASTER <-> SLAVE sync started
28244:S 08 Oct 16:04:04.762 # Error condition on socket for SYNC: Connection refused
28244:S 08 Oct 16:04:05.630 * SLAVE OF 127.0.0.1:6381 enabled (user request from 'id=6 addr=127.0.0.1:43880 fd=12 name= age=148 idle=0 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=224 qbuf-free=
32544 obl=81 oll=0 omem=0 events=r cmd=slaveof')28244:S 08 Oct 16:04:05.636 # CONFIG REWRITE executed with success.
28244:S 08 Oct 16:04:05.770 * Connecting to MASTER 127.0.0.1:6381
28244:S 08 Oct 16:04:05.770 * MASTER <-> SLAVE sync started
28244:S 08 Oct 16:04:05.770 * Non blocking connect for SYNC fired the event.
28244:S 08 Oct 16:04:05.770 * Master replied to PING, replication can continue...
28244:S 08 Oct 16:04:05.770 * Trying a partial resynchronization (request b95802ca8afd97c578b355a5838d219681d0af27:24302).
28244:S 08 Oct 16:04:05.770 * Successful partial resynchronization with master.
28244:S 08 Oct 16:04:05.770 # Master replication ID changed to a4022bb5c361353a4773fd460cec5cdcc5c02031
28244:S 08 Oct 16:04:05.770 * MASTER <-> SLAVE sync: Master accepted a Partial Resynchronization.

 

slave3

28253:S 08 Oct 16:04:03.655 * MASTER <-> SLAVE sync started
28253:S 08 Oct 16:04:03.655 # Error condition on socket for SYNC: Connection refused
28253:M 08 Oct 16:04:04.586 # Setting secondary replication ID to b95802ca8afd97c578b355a5838d219681d0af27, valid up to offset: 24302. New replication ID is a4022bb5c361353a4773fd460cec5cdc
c5c0203128253:M 08 Oct 16:04:04.586 * Discarding previously cached master state.
28253:M 08 Oct 16:04:04.586 * MASTER MODE enabled (user request from 'id=9 addr=127.0.0.1:49316 fd=8 name=sentinel-18311edf-cmd age=137 idle=0 flags=x db=0 sub=0 psub=0 multi=3 qbuf=0 qbuf-
free=32768 obl=36 oll=0 omem=0 events=r cmd=exec')28253:M 08 Oct 16:04:04.593 # CONFIG REWRITE executed with success.
28253:M 08 Oct 16:04:05.770 * Slave 127.0.0.1:6380 asks for synchronization
28253:M 08 Oct 16:04:05.770 * Partial resynchronization request from 127.0.0.1:6380 accepted. Sending 156 bytes of backlog starting from offset 24302.

 

结合上面的日志,可以看到,

各个Sentinel节点都判断127.0.0.1 6379为主观下线(Subjectively Down,缩写为sdown)。

28163:X 08 Oct 16:04:04.289 # +sdown master mymaster 127.0.0.1 6379

 

达到quorum的设置,Sentinel-2判断其为客观下线( ively Down,缩写为odown)。结合其它两个Sentinel节点的日志,可以看到,Sentinel-2最先判定其客观下线。接下来,会进行Sentinel的领导者选举。一般来说,谁先完成客观下线的判定,谁就是领导者,只有Sentinel领导者才能进行failover。

28163:X 08 Oct 16:04:04.366 # +odown master mymaster 127.0.0.1 6379 #quorum 3/2
28163:X 08 Oct 16:04:04.366 # +new-epoch 1
28163:X 08 Oct 16:04:04.366 # +try-failover master mymaster 127.0.0.1 6379
28163:X 08 Oct 16:04:04.373 # +vote-for-leader 18311edfbfb7bf89fe4b67d08ef432053db62fff 1
28163:X 08 Oct 16:04:04.385 # 3e9eb1aa9378d89cfe04fe21bf4a05a901747fa8 voted for 18311edfbfb7bf89fe4b67d08ef432053db62fff 1
28163:X 08 Oct 16:04:04.385 # d4424b8684977767be4f5abd1e364153fbb0adbd voted for 18311edfbfb7bf89fe4b67d08ef432053db62fff 1
28163:X 08 Oct 16:04:04.450 # +elected-leader master mymaster 127.0.0.1 6379

 

寻找合适的slave作为master

28163:X 08 Oct 16:04:04.450 # +failover-state-select-slave master mymaster 127.0.0.1 6379

+failover-state-select-slave <instance details> -- New failover state is select-slave: we are trying to find a suitable slave for promotion.

 

将127.0.0.1 6381设置为新主

28163:X 08 Oct 16:04:04.528 # +selected-slave slave 127.0.0.1:6381 127.0.0.1 6381 @ mymaster 127.0.0.1 6379

+selected-slave <instance details> -- We found the specified good slave to promote.

 

命令6381节点执行slaveof no one,使其成为主节点

28163:X 08 Oct 16:04:04.528 * +failover-state-send-slaveof-noone slave 127.0.0.1:6381 127.0.0.1 6381 @ mymaster 127.0.0.1 6379

+failover-state-send-slaveof-noone <instance details> -- We are trying to reconfigure the promoted slave as master, waiting for it to switch.

 

等待6381节点升级为主节点

28163:X 08 Oct 16:04:04.586 * +failover-state-wait-promotion slave 127.0.0.1:6381 127.0.0.1 6381 @ mymaster 127.0.0.1 6379

 

确认6381节点已经升级为主节点

28163:X 08 Oct 16:04:05.543 # +promoted-slave slave 127.0.0.1:6381 127.0.0.1 6381 @ mymaster 127.0.0.1 6379

 

再来看看16:04:04.528到16:04:05.543这个时间段slave3的日志输出。可以看到,其开启了MASTER模式,且重写了配置文件。

28253:M 08 Oct 16:04:04.586 # Setting secondary replication ID to b95802ca8afd97c578b355a5838d219681d0af27, valid up to offset: 24302. New replication ID is a4022bb5c361353a4773fd460cec5cdcc5c02031
28253:M 08 Oct 16:04:04.586 * Discarding previously cached master state.
28253:M 08 Oct 16:04:04.586 * MASTER MODE enabled (user request from 'id=9 addr=127.0.0.1:49316 fd=8 name=sentinel-18311edf-cmd age=137 idle=0 flags=x db=0 sub=0 psub=0 multi=3 qbuf=0 qbuf-free=32768 obl=36 oll=0 omem=0 events=r cmd=exec')
28253:M 08 Oct 16:04:04.593 # CONFIG REWRITE executed with success.

 

failover进入重新配置从节点阶段

28163:X 08 Oct 16:04:05.543 # +failover-state-reconf-slaves master mymaster 127.0.0.1 6379

 

命令6380节点复制新的主节点

28163:X 08 Oct 16:04:05.629 * +slave-reconf-sent slave 127.0.0.1:6380 127.0.0.1 6380 @ mymaster 127.0.0.1 6379

+slave-reconf-sent <instance details> -- The leader sentinel sent the SLAVEOF command to this instance in order to reconfigure it for the new slave.

 

看看这个时间点slave2的日志输出,基本吻合。其进行的是增量同步。

28244:S 08 Oct 16:04:05.630 * SLAVE OF 127.0.0.1:6381 enabled (user request from 'id=6 addr=127.0.0.1:43880 fd=12 name= age=148 idle=0 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=224 qbuf-free=32544 obl=81 oll=0 omem=0 events=r cmd=slaveof')
28244:S 08 Oct 16:04:05.636 # CONFIG REWRITE executed with success.
28244:S 08 Oct 16:04:05.770 * Connecting to MASTER 127.0.0.1:6381
28244:S 08 Oct 16:04:05.770 * MASTER <-> SLAVE sync started
28244:S 08 Oct 16:04:05.770 * Non blocking connect for SYNC fired the event.
28244:S 08 Oct 16:04:05.770 * Master replied to PING, replication can continue...
28244:S 08 Oct 16:04:05.770 * Trying a partial resynchronization (request b95802ca8afd97c578b355a5838d219681d0af27:24302).
28244:S 08 Oct 16:04:05.770 * Successful partial resynchronization with master.
28244:S 08 Oct 16:04:05.770 # Master replication ID changed to a4022bb5c361353a4773fd460cec5cdcc5c02031
28244:S 08 Oct 16:04:05.770 * MASTER <-> SLAVE sync: Master accepted a Partial Resynchronization.

 

同时,在这个时间点,sentinel也有日志输出,以sentinel1为例。从日志中,可以看到,在这个时间点它会更改配置信息。

28087:X 08 Oct 16:04:05.631 # +config-update-from sentinel 18311edfbfb7bf89fe4b67d08ef432053db62fff 127.0.0.1 26380 @ mymaster 127.0.0.1 6379
28087:X 08 Oct 16:04:05.631 # +switch-master mymaster 127.0.0.1 6379 127.0.0.1 6381
28087:X 08 Oct 16:04:05.631 * +slave slave 127.0.0.1:6380 127.0.0.1 6380 @ mymaster 127.0.0.1 6381
28087:X 08 Oct 16:04:05.631 * +slave slave 127.0.0.1:6379 127.0.0.1 6379 @ mymaster 127.0.0.1 6381

switch-master <master name> <oldip> <oldport> <newip> <newport> -- The master new IP and address is the specified one after a configuration change. This is the message most external users are interested in.

 

同步过程尚未完成。

28163:X 08 Oct 16:04:06.555 * +slave-reconf-inprog slave 127.0.0.1:6380 127.0.0.1 6380 @ mymaster 127.0.0.1 6379

+slave-reconf-inprog <instance details> -- The slave being reconfigured showed to be a slave of the new master ip:port pair, but the synchronization process is not yet complete.

 

主从同步完成。

28163:X 08 Oct 16:04:06.555 * +slave-reconf-done slave 127.0.0.1:6380 127.0.0.1 6380 @ mymaster 127.0.0.1 6379

+slave-reconf-done <instance details> -- The slave is now synchronized with the new master.

 

failover切换完成。

28163:X 08 Oct 16:04:06.606 # +failover-end master mymaster 127.0.0.1 6379

 

failover成功后,发布主节点的切换消息

28163:X 08 Oct 16:04:06.606 # +switch-master mymaster 127.0.0.1 6379 127.0.0.1 6381

 

 关联新主节点的slave信息,需要注意的是,原来的主节点会作为新主节点的slave。

28163:X 08 Oct 16:04:06.606 * +slave slave 127.0.0.1:6380 127.0.0.1 6380 @ mymaster 127.0.0.1 6381
28163:X 08 Oct 16:04:06.606 * +slave slave 127.0.0.1:6379 127.0.0.1 6379 @ mymaster 127.0.0.1 6381

+slave <instance details> -- A new slave was detected and attached.

 

过了30s后,判定原来的主节点主观下线。

28163:X 08 Oct 16:04:36.687 # +sdown slave 127.0.0.1:6379 127.0.0.1 6379 @ mymaster 127.0.0.1 6381

 

综合来看,Sentinel进行failover的流程如下

1. 每隔1秒,每个Sentinel节点会向主节点、从节点、其余Sentinel节点发送一条ping命令做一次心跳检测,来确认这些节点当前是否可达。当这些节点超过down-after-milliseconds没有进行有效回复,Sentinel节点就会判定该节点为主观下线。

2. 如果被判定为主观下线的节点是主节点,该Sentinel节点会通过sentinel is master-down-by-addr命令向其他Sentinel节点询问对主节点的判断,当超过<quorum>个数,Sentinel节点会判定该节点为客观下线。如果从节点、Sentinel节点被判定为主观下线,并不会进行后续的故障切换操作。

3. 对Sentinel进行领导者选举,由其来进行后续的故障切换(failover)工作。选举算法基于Raft。

4. Sentinel领导者节点开始进行故障切换。

5. 选择合适的从节点作为新主节点。

6. Sentinel领导者节点对上一步选出来的从节点执行slaveof no one命令让其成为主节点。

7. 向剩余的从节点发送命令,让它们成为新主节点的从节点,复制规则和parallel-syncs参数有关。

8. 将原来的主节点更新为从节点,并将其纳入到Sentinel的管理,让其恢复后去复制新的主节点。

 

Sentinel的领导者选举流程。

Sentinel的领导者选举基于Raft协议。

1. 每个在线的Sentinel节点都有资格成为领导者,当它确认主节点主观下线时候,会向其他Sentinel节点发送sentinel is-master-down-by-addr命令,要求将自己设置为领导者。

2. 收到命令的Sentinel节点,如果没有同意过其他Sentinel节点的sentinel is-master-down-by-addr命令,将同意该请求,否则拒绝。

3. 如果该Sentinel节点发现自己的票数已经大于等于max(quorum,num(sentinels)/2+1),那么它将成为领导者。

 

新主节点的选择流程。

1. 删除所有已经处于下线或断线状态的从节点。

2. 删除最近5秒没有回复过领导者Sentinel的INFO命令的从节点。

3. 删除所有与已下线主节点连接断开超过down-after-milliseconds*10毫秒的从节点。

4. 选择优先级最高的从节点。

5. 选择复制偏移量最大的从节点。

6. 选择runid最小的从节点。 

 

三个定时监控任务

1. 每隔10秒,每个Sentinel节点会向主节点和从节点发送info命令获取最新的拓扑结构。其作用如下:

    1> 通过向主节点执行info命令,获取从节点的信息,这也是为什么Sentinel节点不需要显式配置监控从节点。
    2> 当有新的从节点加入时可立刻感知出来。
    3> 节点不可达或者故障切换后,可通过info命令实时更新节点拓扑信息。

2. 每隔2秒,每个Sentinel节点会向Redis数据节点的__sentinel__:hello频道上发送该Sentinel节点对于主节点的判断以及当前Sentinel节点的信息,同时每个Sentinel节点也会订阅该频道,来了解其它Sentinel节点以及它们对主节点的判断。其作用如下:

   1> 发现新的Sentinel节点:通过订阅主节点的__sentinel__:hello了解其它Sentinel节点信息,如果是新加入的Sentinel节点,将该Sentinel节点信息保存起来,并与该Sentinel节点创建连接。
   2> Sentinel节点之间交换主节点的状态,作为后面客观下线以及领导者选举的依据。

3. 每隔1秒,每个Sentinel节点会向主节点、从节点、其余Sentinel节点发送一条ping命令做一次心跳检测,来确认这些节点当前是否可达。这个定时任务是节点失败判定的重要依据。

 

Sentinel的相关参数

# bind 127.0.0.1 192.168.1.1
# protected-mode no
port 26379
# sentinel announce-ip <ip>
# sentinel announce-port <port>
dir /tmp
sentinel monitor mymaster 127.0.0.1 6379 2
# sentinel auth-pass <master-name> <password>
sentinel down-after-milliseconds mymaster 30000
sentinel parallel-syncs mymaster 1
sentinel failover-timeout mymaster 180000
# sentinel notification-  mymaster /var/redis/notify.sh
# sentinel client-reconfig-  mymaster /var/redis/reconfig.sh
sentinel deny- s-reconfig yes

其中,

dir:设置Sentinel的工作目录。

sentinel monitor mymaster 127.0.0.1 6379 2:其中2是quorum,即权重,代表至少需要两个Sentinel节点认为主节点主观下线,才可判定主节点为客观下线。一般建议将其设置为Sentinel节点的一半加1。不仅如此,quorum还与Sentinel节点的领导者选举有关。为了选出Sentinel的领导者,至少需要max(quorum, num(sentinels) / 2 + 1)个Sentinel节点参与选举。

 

sentinel down-after-milliseconds mymaster 30000:每个Sentinel节点都要通过定期发送ping命令来判断Redis节点和其余Sentinel节点是否可达。

如果在指定的时间内,没有收到主节点的有效回复,则判断其为主观下线。需要注意的是,该参数不仅用来判断主节点状态,同样也用来判断该主节点下面的从节点及其它Sentinel的状态。其默认值为30s。

 

sentinel parallel-syncs mymaster 1:在failover期间,允许多少个slave同时指向新的主节点。如果numslaves设置较大的话,虽然复制操作并不会阻塞主节点,但多个节点同时指向新的主节点,会增加主节点的网络和磁盘IO负载。

 

sentinel failover-timeout mymaster 180000:定义故障切换超时时间。默认180000,单位秒,即3min。需要注意的是,该时间不是总的故障切换的时间,而是适用于故障切换的多个场景。

# Specifies the failover timeout in milliseconds. It is used in many ways:
#
# - The time needed to re-start a failover after a previous failover was
#   already tried against the same master by a given Sentinel, is two
#   times the failover timeout.
#
# - The time needed for a slave replicating to a wrong master according
#   to a Sentinel current configuration, to be forced to replicate
#   with the right master, is exactly the failover timeout (counting since
#   the moment a Sentinel detected the misconfiguration).
#
# - The time needed to cancel a failover that is already in progress but
#   did not produced any configuration change (SLAVEOF NO ONE yet not
#   acknowledged by the promoted slave).
#
# - The maximum time a failover in progress waits for all the slaves to be
#   reconfigured as slaves of the new master. However even after this time
#   the slaves will be reconfigured by the Sentinels anyway, but not with
#   the exact parallel-syncs progression as specified.

第一种适用场景:如果Redis Sentinel对一个主节点故障切换失败,那么下次再对该主节点做故障切换的起始时间是failover-timeout的2倍。这点从Sentinel的日志就可体现出来(28234:X 08 Oct 16:04:04.385 # Next failover delay: I will not start a failover before Mon Oct  8 16:10:04 2018)

 

sentinel notification- :定义通知脚本,当Sentinel出现WARNING级别的事件时,会调用该脚本,其会传入两个参数:事件类型,事件描述。

sentinel client-reconfig- :当主节点发生切换时,会调用该参数定义的脚本,其会传入以下参数:<master-name> <role> <state> <from-ip> <from-port> <to-ip> <to-port>

关于脚本,其必须遵循一定的规则。

#  S EXECUTION
#
# sentinel notification-  and sentinel reconfig-  are used in order
# to configure  s that are called to notify the system administrator
# or to reconfigure clients after a failover. The  s are executed
# with the following rules for error handling:
#
# If   exits with "1" the execution is retried later (up to a maximum
# number of times currently set to 10).
#
# If   exits with "2" (or an higher value) the   execution is
# not retried.
#
# If   terminates because it receives a signal the behavior is the same
# as exit code 1.
#
# A   has a maximum running time of 60 seconds. After this limi					
收藏 打印
您的足迹: