Redis Sentinel高可用方案详细讲解

Redis Sentinel是Redis的高可用方案。是Redis 2.8中正式引入的。

在之前的主从复制方案中，如果主节点出现问题，需要手动将一个从节点升级为主节点，然后将其它从节点指向新的主节点，并且需要修改应用方主节点的地址。整个过程都需要人工干预。

下面通过日志具体看看Sentinel的切换流程。

Sentinel的切换流程

集群拓扑图如下。

角色 IP 端口 runID

主节点 127.0.0.1 6379

从节点-1 127.0.0.1 6380

从节点-2 127.0.0.1 6381

Sentinel-1 127.0.0.1 26379 d4424b8684977767be4f5abd1e364153fbb0adbd

Sentinel-2 127.0.0.1 26380 18311edfbfb7bf89fe4b67d08ef432053db62fff

Sentinel-3 127.0.0.1 26381 3e9eb1aa9378d89cfe04fe21bf4a05a901747fa8

kill -9 将主节点进程杀死。

1. 最先反应的是从节点。

其会马上输出如下信息。

28244:S 08 Oct 16:03:34.184 # Connection with master lost.
28244:S 08 Oct 16:03:34.184 * Caching the disconnected master state.
28244:S 08 Oct 16:03:34.548 * Connecting to MASTER 127.0.0.1:6379
28244:S 08 Oct 16:03:34.548 * MASTER <-> SLAVE sync started
28244:S 08 Oct 16:03:34.548 # Error condition on socket for SYNC: Connection refused
28244:S 08 Oct 16:03:35.556 * Connecting to MASTER 127.0.0.1:6379
28244:S 08 Oct 16:03:35.556 * MASTER <-> SLAVE sync started
...

2. Sentinel的日志30s后才有输出，这个与“sentinel down-after-milliseconds mymaster 30000”的设置有关。

下面，依次贴出哨兵各个节点及slave的日志输出。

Sentinel-1

28087:X 08 Oct 16:04:04.277 # +sdown master mymaster 127.0.0.1 6379
28087:X 08 Oct 16:04:04.379 # +new-epoch 1
28087:X 08 Oct 16:04:04.385 # +vote-for-leader 18311edfbfb7bf89fe4b67d08ef432053db62fff 1
28087:X 08 Oct 16:04:05.388 # +odown master mymaster 127.0.0.1 6379 #quorum 3/2
28087:X 08 Oct 16:04:05.388 # Next failover delay: I will not start a failover before Mon Oct  8 16:10:04 2018
28087:X 08 Oct 16:04:05.631 # +config-update-from sentinel 18311edfbfb7bf89fe4b67d08ef432053db62fff 127.0.0.1 26380 @ mymaster 127.0.0.1 6379
28087:X 08 Oct 16:04:05.631 # +switch-master mymaster 127.0.0.1 6379 127.0.0.1 6381
28087:X 08 Oct 16:04:05.631 * +slave slave 127.0.0.1:6380 127.0.0.1 6380 @ mymaster 127.0.0.1 6381
28087:X 08 Oct 16:04:05.631 * +slave slave 127.0.0.1:6379 127.0.0.1 6379 @ mymaster 127.0.0.1 6381
28087:X 08 Oct 16:04:35.656 # +sdown slave 127.0.0.1:6379 127.0.0.1 6379 @ mymaster 127.0.0.1 6381

Sentinel-2

28163:X 08 Oct 16:04:04.289 # +sdown master mymaster 127.0.0.1 6379
28163:X 08 Oct 16:04:04.366 # +odown master mymaster 127.0.0.1 6379 #quorum 3/2
28163:X 08 Oct 16:04:04.366 # +new-epoch 1
28163:X 08 Oct 16:04:04.366 # +try-failover master mymaster 127.0.0.1 6379
28163:X 08 Oct 16:04:04.373 # +vote-for-leader 18311edfbfb7bf89fe4b67d08ef432053db62fff 1
28163:X 08 Oct 16:04:04.385 # 3e9eb1aa9378d89cfe04fe21bf4a05a901747fa8 voted for 18311edfbfb7bf89fe4b67d08ef432053db62fff 1
28163:X 08 Oct 16:04:04.385 # d4424b8684977767be4f5abd1e364153fbb0adbd voted for 18311edfbfb7bf89fe4b67d08ef432053db62fff 1
28163:X 08 Oct 16:04:04.450 # +elected-leader master mymaster 127.0.0.1 6379
28163:X 08 Oct 16:04:04.450 # +failover-state-select-slave master mymaster 127.0.0.1 6379
28163:X 08 Oct 16:04:04.528 # +selected-slave slave 127.0.0.1:6381 127.0.0.1 6381 @ mymaster 127.0.0.1 6379
28163:X 08 Oct 16:04:04.528 * +failover-state-send-slaveof-noone slave 127.0.0.1:6381 127.0.0.1 6381 @ mymaster 127.0.0.1 6379
28163:X 08 Oct 16:04:04.586 * +failover-state-wait-promotion slave 127.0.0.1:6381 127.0.0.1 6381 @ mymaster 127.0.0.1 6379
28163:X 08 Oct 16:04:05.543 # +promoted-slave slave 127.0.0.1:6381 127.0.0.1 6381 @ mymaster 127.0.0.1 6379
28163:X 08 Oct 16:04:05.543 # +failover-state-reconf-slaves master mymaster 127.0.0.1 6379
28163:X 08 Oct 16:04:05.629 * +slave-reconf-sent slave 127.0.0.1:6380 127.0.0.1 6380 @ mymaster 127.0.0.1 6379
28163:X 08 Oct 16:04:06.554 # -odown master mymaster 127.0.0.1 6379
28163:X 08 Oct 16:04:06.555 * +slave-reconf-inprog slave 127.0.0.1:6380 127.0.0.1 6380 @ mymaster 127.0.0.1 6379
28163:X 08 Oct 16:04:06.555 * +slave-reconf-done slave 127.0.0.1:6380 127.0.0.1 6380 @ mymaster 127.0.0.1 6379
28163:X 08 Oct 16:04:06.606 # +failover-end master mymaster 127.0.0.1 6379
28163:X 08 Oct 16:04:06.606 # +switch-master mymaster 127.0.0.1 6379 127.0.0.1 6381
28163:X 08 Oct 16:04:06.606 * +slave slave 127.0.0.1:6380 127.0.0.1 6380 @ mymaster 127.0.0.1 6381
28163:X 08 Oct 16:04:06.606 * +slave slave 127.0.0.1:6379 127.0.0.1 6379 @ mymaster 127.0.0.1 6381
28163:X 08 Oct 16:04:36.687 # +sdown slave 127.0.0.1:6379 127.0.0.1 6379 @ mymaster 127.0.0.1 6381

Sentinel-3

28234:X 08 Oct 16:04:04.288 # +sdown master mymaster 127.0.0.1 6379
28234:X 08 Oct 16:04:04.378 # +new-epoch 1
28234:X 08 Oct 16:04:04.385 # +vote-for-leader 18311edfbfb7bf89fe4b67d08ef432053db62fff 1
28234:X 08 Oct 16:04:04.385 # +odown master mymaster 127.0.0.1 6379 #quorum 2/2
28234:X 08 Oct 16:04:04.385 # Next failover delay: I will not start a failover before Mon Oct  8 16:10:04 2018
28234:X 08 Oct 16:04:05.630 # +config-update-from sentinel 18311edfbfb7bf89fe4b67d08ef432053db62fff 127.0.0.1 26380 @ mymaster 127.0.0.1 6379
28234:X 08 Oct 16:04:05.630 # +switch-master mymaster 127.0.0.1 6379 127.0.0.1 6381
28234:X 08 Oct 16:04:05.630 * +slave slave 127.0.0.1:6380 127.0.0.1 6380 @ mymaster 127.0.0.1 6381
28234:X 08 Oct 16:04:05.630 * +slave slave 127.0.0.1:6379 127.0.0.1 6379 @ mymaster 127.0.0.1 6381
28234:X 08 Oct 16:04:35.709 # +sdown slave 127.0.0.1:6379 127.0.0.1 6379 @ mymaster 127.0.0.1 6381

slave2

28244:S 08 Oct 16:04:04.762 * MASTER <-> SLAVE sync started
28244:S 08 Oct 16:04:04.762 # Error condition on socket for SYNC: Connection refused
28244:S 08 Oct 16:04:05.630 * SLAVE OF 127.0.0.1:6381 enabled (user request from 'id=6 addr=127.0.0.1:43880 fd=12 name= age=148 idle=0 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=224 qbuf-free=
32544 obl=81 oll=0 omem=0 events=r cmd=slaveof')28244:S 08 Oct 16:04:05.636 # CONFIG REWRITE executed with success.
28244:S 08 Oct 16:04:05.770 * Connecting to MASTER 127.0.0.1:6381
28244:S 08 Oct 16:04:05.770 * MASTER <-> SLAVE sync started
28244:S 08 Oct 16:04:05.770 * Non blocking connect for SYNC fired the event.
28244:S 08 Oct 16:04:05.770 * Master replied to PING, replication can continue...
28244:S 08 Oct 16:04:05.770 * Trying a partial resynchronization (request b95802ca8afd97c578b355a5838d219681d0af27:24302).
28244:S 08 Oct 16:04:05.770 * Successful partial resynchronization with master.
28244:S 08 Oct 16:04:05.770 # Master replication ID changed to a4022bb5c361353a4773fd460cec5cdcc5c02031
28244:S 08 Oct 16:04:05.770 * MASTER <-> SLAVE sync: Master accepted a Partial Resynchronization.

slave3

28253:S 08 Oct 16:04:03.655 * MASTER <-> SLAVE sync started
28253:S 08 Oct 16:04:03.655 # Error condition on socket for SYNC: Connection refused
28253:M 08 Oct 16:04:04.586 # Setting secondary replication ID to b95802ca8afd97c578b355a5838d219681d0af27, valid up to offset: 24302. New replication ID is a4022bb5c361353a4773fd460cec5cdc
c5c0203128253:M 08 Oct 16:04:04.586 * Discarding previously cached master state.
28253:M 08 Oct 16:04:04.586 * MASTER MODE enabled (user request from 'id=9 addr=127.0.0.1:49316 fd=8 name=sentinel-18311edf-cmd age=137 idle=0 flags=x db=0 sub=0 psub=0 multi=3 qbuf=0 qbuf-
free=32768 obl=36 oll=0 omem=0 events=r cmd=exec')28253:M 08 Oct 16:04:04.593 # CONFIG REWRITE executed with success.
28253:M 08 Oct 16:04:05.770 * Slave 127.0.0.1:6380 asks for synchronization
28253:M 08 Oct 16:04:05.770 * Partial resynchronization request from 127.0.0.1:6380 accepted. Sending 156 bytes of backlog starting from offset 24302.

结合上面的日志，可以看到，

各个Sentinel节点都判断127.0.0.1 6379为主观下线（Subjectively Down，缩写为sdown）。

28163:X 08 Oct 16:04:04.289 # +sdown master mymaster 127.0.0.1 6379

达到quorum的设置，Sentinel-2判断其为客观下线（ ively Down，缩写为odown）。结合其它两个Sentinel节点的日志，可以看到，Sentinel-2最先判定其客观下线。接下来，会进行Sentinel的领导者选举。一般来说，谁先完成客观下线的判定，谁就是领导者，只有Sentinel领导者才能进行failover。

28163:X 08 Oct 16:04:04.366 # +odown master mymaster 127.0.0.1 6379 #quorum 3/2
28163:X 08 Oct 16:04:04.366 # +new-epoch 1
28163:X 08 Oct 16:04:04.366 # +try-failover master mymaster 127.0.0.1 6379
28163:X 08 Oct 16:04:04.373 # +vote-for-leader 18311edfbfb7bf89fe4b67d08ef432053db62fff 1
28163:X 08 Oct 16:04:04.385 # 3e9eb1aa9378d89cfe04fe21bf4a05a901747fa8 voted for 18311edfbfb7bf89fe4b67d08ef432053db62fff 1
28163:X 08 Oct 16:04:04.385 # d4424b8684977767be4f5abd1e364153fbb0adbd voted for 18311edfbfb7bf89fe4b67d08ef432053db62fff 1
28163:X 08 Oct 16:04:04.450 # +elected-leader master mymaster 127.0.0.1 6379

寻找合适的slave作为master

28163:X 08 Oct 16:04:04.450 # +failover-state-select-slave master mymaster 127.0.0.1 6379

+failover-state-select-slave <instance details> -- New failover state is select-slave: we are trying to find a suitable slave for promotion.

将127.0.0.1 6381设置为新主

28163:X 08 Oct 16:04:04.528 # +selected-slave slave 127.0.0.1:6381 127.0.0.1 6381 @ mymaster 127.0.0.1 6379

+selected-slave <instance details> -- We found the specified good slave to promote.

命令6381节点执行slaveof no one，使其成为主节点

28163:X 08 Oct 16:04:04.528 * +failover-state-send-slaveof-noone slave 127.0.0.1:6381 127.0.0.1 6381 @ mymaster 127.0.0.1 6379

+failover-state-send-slaveof-noone <instance details> -- We are trying to reconfigure the promoted slave as master, waiting for it to switch.

等待6381节点升级为主节点

28163:X 08 Oct 16:04:04.586 * +failover-state-wait-promotion slave 127.0.0.1:6381 127.0.0.1 6381 @ mymaster 127.0.0.1 6379

确认6381节点已经升级为主节点

28163:X 08 Oct 16:04:05.543 # +promoted-slave slave 127.0.0.1:6381 127.0.0.1 6381 @ mymaster 127.0.0.1 6379

再来看看16:04:04.528到16:04:05.543这个时间段slave3的日志输出。可以看到，其开启了MASTER模式，且重写了配置文件。

28253:M 08 Oct 16:04:04.586 # Setting secondary replication ID to b95802ca8afd97c578b355a5838d219681d0af27, valid up to offset: 24302. New replication ID is a4022bb5c361353a4773fd460cec5cdcc5c02031
28253:M 08 Oct 16:04:04.586 * Discarding previously cached master state.
28253:M 08 Oct 16:04:04.586 * MASTER MODE enabled (user request from 'id=9 addr=127.0.0.1:49316 fd=8 name=sentinel-18311edf-cmd age=137 idle=0 flags=x db=0 sub=0 psub=0 multi=3 qbuf=0 qbuf-free=32768 obl=36 oll=0 omem=0 events=r cmd=exec')
28253:M 08 Oct 16:04:04.593 # CONFIG REWRITE executed with success.

failover进入重新配置从节点阶段

28163:X 08 Oct 16:04:05.543 # +failover-state-reconf-slaves master mymaster 127.0.0.1 6379

命令6380节点复制新的主节点

28163:X 08 Oct 16:04:05.629 * +slave-reconf-sent slave 127.0.0.1:6380 127.0.0.1 6380 @ mymaster 127.0.0.1 6379

+slave-reconf-sent <instance details> -- The leader sentinel sent the SLAVEOF command to this instance in order to reconfigure it for the new slave.

看看这个时间点slave2的日志输出，基本吻合。其进行的是增量同步。

28244:S 08 Oct 16:04:05.630 * SLAVE OF 127.0.0.1:6381 enabled (user request from 'id=6 addr=127.0.0.1:43880 fd=12 name= age=148 idle=0 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=224 qbuf-free=32544 obl=81 oll=0 omem=0 events=r cmd=slaveof')
28244:S 08 Oct 16:04:05.636 # CONFIG REWRITE executed with success.
28244:S 08 Oct 16:04:05.770 * Connecting to MASTER 127.0.0.1:6381
28244:S 08 Oct 16:04:05.770 * MASTER <-> SLAVE sync started
28244:S 08 Oct 16:04:05.770 * Non blocking connect for SYNC fired the event.
28244:S 08 Oct 16:04:05.770 * Master replied to PING, replication can continue...
28244:S 08 Oct 16:04:05.770 * Trying a partial resynchronization (request b95802ca8afd97c578b355a5838d219681d0af27:24302).
28244:S 08 Oct 16:04:05.770 * Successful partial resynchronization with master.
28244:S 08 Oct 16:04:05.770 # Master replication ID changed to a4022bb5c361353a4773fd460cec5cdcc5c02031
28244:S 08 Oct 16:04:05.770 * MASTER <-> SLAVE sync: Master accepted a Partial Resynchronization.

同时，在这个时间点，sentinel也有日志输出，以sentinel1为例。从日志中，可以看到，在这个时间点它会更改配置信息。

28087:X 08 Oct 16:04:05.631 # +config-update-from sentinel 18311edfbfb7bf89fe4b67d08ef432053db62fff 127.0.0.1 26380 @ mymaster 127.0.0.1 6379
28087:X 08 Oct 16:04:05.631 # +switch-master mymaster 127.0.0.1 6379 127.0.0.1 6381
28087:X 08 Oct 16:04:05.631 * +slave slave 127.0.0.1:6380 127.0.0.1 6380 @ mymaster 127.0.0.1 6381
28087:X 08 Oct 16:04:05.631 * +slave slave 127.0.0.1:6379 127.0.0.1 6379 @ mymaster 127.0.0.1 6381

switch-master <master name> <oldip> <oldport> <newip> <newport> -- The master new IP and address is the specified one after a configuration change. This is the message most external users are interested in.

同步过程尚未完成。

28163:X 08 Oct 16:04:06.555 * +slave-reconf-inprog slave 127.0.0.1:6380 127.0.0.1 6380 @ mymaster 127.0.0.1 6379

+slave-reconf-inprog <instance details> -- The slave being reconfigured showed to be a slave of the new master ip:port pair, but the synchronization process is not yet complete.

主从同步完成。

28163:X 08 Oct 16:04:06.555 * +slave-reconf-done slave 127.0.0.1:6380 127.0.0.1 6380 @ mymaster 127.0.0.1 6379

+slave-reconf-done <instance details> -- The slave is now synchronized with the new master.

failover切换完成。

28163:X 08 Oct 16:04:06.606 # +failover-end master mymaster 127.0.0.1 6379

failover成功后，发布主节点的切换消息

28163:X 08 Oct 16:04:06.606 # +switch-master mymaster 127.0.0.1 6379 127.0.0.1 6381

关联新主节点的slave信息，需要注意的是，原来的主节点会作为新主节点的slave。

28163:X 08 Oct 16:04:06.606 * +slave slave 127.0.0.1:6380 127.0.0.1 6380 @ mymaster 127.0.0.1 6381
28163:X 08 Oct 16:04:06.606 * +slave slave 127.0.0.1:6379 127.0.0.1 6379 @ mymaster 127.0.0.1 6381

+slave <instance details> -- A new slave was detected and attached.

过了30s后，判定原来的主节点主观下线。

28163:X 08 Oct 16:04:36.687 # +sdown slave 127.0.0.1:6379 127.0.0.1 6379 @ mymaster 127.0.0.1 6381

综合来看，Sentinel进行failover的流程如下

1. 每隔1秒，每个Sentinel节点会向主节点、从节点、其余Sentinel节点发送一条ping命令做一次心跳检测，来确认这些节点当前是否可达。当这些节点超过down-after-milliseconds没有进行有效回复，Sentinel节点就会判定该节点为主观下线。

2. 如果被判定为主观下线的节点是主节点，该Sentinel节点会通过sentinel is master-down-by-addr命令向其他Sentinel节点询问对主节点的判断，当超过<quorum>个数，Sentinel节点会判定该节点为客观下线。如果从节点、Sentinel节点被判定为主观下线，并不会进行后续的故障切换操作。

3. 对Sentinel进行领导者选举，由其来进行后续的故障切换（failover）工作。选举算法基于Raft。

4. Sentinel领导者节点开始进行故障切换。

5. 选择合适的从节点作为新主节点。

6. Sentinel领导者节点对上一步选出来的从节点执行slaveof no one命令让其成为主节点。

7. 向剩余的从节点发送命令，让它们成为新主节点的从节点，复制规则和parallel-syncs参数有关。

8. 将原来的主节点更新为从节点，并将其纳入到Sentinel的管理，让其恢复后去复制新的主节点。

Sentinel的领导者选举流程。

Sentinel的领导者选举基于Raft协议。

1. 每个在线的Sentinel节点都有资格成为领导者，当它确认主节点主观下线时候，会向其他Sentinel节点发送sentinel is-master-down-by-addr命令，要求将自己设置为领导者。

2. 收到命令的Sentinel节点，如果没有同意过其他Sentinel节点的sentinel is-master-down-by-addr命令，将同意该请求，否则拒绝。

3. 如果该Sentinel节点发现自己的票数已经大于等于max（quorum，num（sentinels）/2+1），那么它将成为领导者。

新主节点的选择流程。

1. 删除所有已经处于下线或断线状态的从节点。

2. 删除最近5秒没有回复过领导者Sentinel的INFO命令的从节点。

3. 删除所有与已下线主节点连接断开超过down-after-milliseconds*10毫秒的从节点。

4. 选择优先级最高的从节点。

5. 选择复制偏移量最大的从节点。

6. 选择runid最小的从节点。

三个定时监控任务

1. 每隔10秒，每个Sentinel节点会向主节点和从节点发送info命令获取最新的拓扑结构。其作用如下：

1> 通过向主节点执行info命令，获取从节点的信息，这也是为什么Sentinel节点不需要显式配置监控从节点。
2> 当有新的从节点加入时可立刻感知出来。
3> 节点不可达或者故障切换后，可通过info命令实时更新节点拓扑信息。

2. 每隔2秒，每个Sentinel节点会向Redis数据节点的__sentinel__：hello频道上发送该Sentinel节点对于主节点的判断以及当前Sentinel节点的信息，同时每个Sentinel节点也会订阅该频道，来了解其它Sentinel节点以及它们对主节点的判断。其作用如下：

1> 发现新的Sentinel节点：通过订阅主节点的__sentinel__：hello了解其它Sentinel节点信息，如果是新加入的Sentinel节点，将该Sentinel节点信息保存起来，并与该Sentinel节点创建连接。
2> Sentinel节点之间交换主节点的状态，作为后面客观下线以及领导者选举的依据。

3. 每隔1秒，每个Sentinel节点会向主节点、从节点、其余Sentinel节点发送一条ping命令做一次心跳检测，来确认这些节点当前是否可达。这个定时任务是节点失败判定的重要依据。

Sentinel的相关参数

# bind 127.0.0.1 192.168.1.1
# protected-mode no
port 26379
# sentinel announce-ip <ip>
# sentinel announce-port <port>
dir /tmp
sentinel monitor mymaster 127.0.0.1 6379 2
# sentinel auth-pass <master-name> <password>
sentinel down-after-milliseconds mymaster 30000
sentinel parallel-syncs mymaster 1
sentinel failover-timeout mymaster 180000
# sentinel notification-  mymaster /var/redis/notify.sh
# sentinel client-reconfig-  mymaster /var/redis/reconfig.sh
sentinel deny- s-reconfig yes

其中，

dir：设置Sentinel的工作目录。

sentinel monitor mymaster 127.0.0.1 6379 2：其中2是quorum，即权重，代表至少需要两个Sentinel节点认为主节点主观下线，才可判定主节点为客观下线。一般建议将其设置为Sentinel节点的一半加1。不仅如此，quorum还与Sentinel节点的领导者选举有关。为了选出Sentinel的领导者，至少需要max(quorum, num(sentinels) / 2 + 1)个Sentinel节点参与选举。

sentinel down-after-milliseconds mymaster 30000：每个Sentinel节点都要通过定期发送ping命令来判断Redis节点和其余Sentinel节点是否可达。

如果在指定的时间内，没有收到主节点的有效回复，则判断其为主观下线。需要注意的是，该参数不仅用来判断主节点状态，同样也用来判断该主节点下面的从节点及其它Sentinel的状态。其默认值为30s。

sentinel parallel-syncs mymaster 1：在failover期间，允许多少个slave同时指向新的主节点。如果numslaves设置较大的话，虽然复制操作并不会阻塞主节点，但多个节点同时指向新的主节点，会增加主节点的网络和磁盘IO负载。

sentinel failover-timeout mymaster 180000：定义故障切换超时时间。默认180000，单位秒，即3min。需要注意的是，该时间不是总的故障切换的时间，而是适用于故障切换的多个场景。

# Specifies the failover timeout in milliseconds. It is used in many ways:
#
# - The time needed to re-start a failover after a previous failover was
#   already tried against the same master by a given Sentinel, is two
#   times the failover timeout.
#
# - The time needed for a slave replicating to a wrong master according
#   to a Sentinel current configuration, to be forced to replicate
#   with the right master, is exactly the failover timeout (counting since
#   the moment a Sentinel detected the misconfiguration).
#
# - The time needed to cancel a failover that is already in progress but
#   did not produced any configuration change (SLAVEOF NO ONE yet not
#   acknowledged by the promoted slave).
#
# - The maximum time a failover in progress waits for all the slaves to be
#   reconfigured as slaves of the new master. However even after this time
#   the slaves will be reconfigured by the Sentinels anyway, but not with
#   the exact parallel-syncs progression as specified.

第一种适用场景：如果Redis Sentinel对一个主节点故障切换失败，那么下次再对该主节点做故障切换的起始时间是failover-timeout的2倍。这点从Sentinel的日志就可体现出来（28234:X 08 Oct 16:04:04.385 # Next failover delay: I will not start a failover before Mon Oct 8 16:10:04 2018）

sentinel notification- ：定义通知脚本，当Sentinel出现WARNING级别的事件时，会调用该脚本，其会传入两个参数：事件类型，事件描述。

sentinel client-reconfig- ：当主节点发生切换时，会调用该参数定义的脚本，其会传入以下参数：<master-name> <role> <state> <from-ip> <from-port> <to-ip> <to-port>

关于脚本，其必须遵循一定的规则。

#  S EXECUTION
#
# sentinel notification-  and sentinel reconfig-  are used in order
# to configure  s that are called to notify the system administrator
# or to reconfigure clients after a failover. The  s are executed
# with the following rules for error handling:
#
# If   exits with "1" the execution is retried later (up to a maximum
# number of times currently set to 10).
#
# If   exits with "2" (or an higher value) the   execution is
# not retried.
#
# If   terminates because it receives a signal the behavior is the same
# as exit code 1.
#
# A   has a maximum running time of 60 seconds. After this limi