zk-leader选举流程描述

zk的选举流程涉及很多个线程的并发控制明确每个线程的职责后进行流程的具体分析

首先准备好环境 https://blog.csdn.net/zhaoyu_nb/article/details/88663599
正式开始其实是从makeLEStrategy().lookForLeader()方法的调用开始的这里根据配置文件的electorArg来指定具体的选举算法，一般默认使用3 其他的都弃用了

 switch (electionAlgorithm) {
        case 0:
            le = new LeaderElection(this);
            break;
        case 1:
            le = new AuthFastLeaderElection(this);
            break;
        case 2:
            le = new AuthFastLeaderElection(this, true);
            break;
        case 3:
            qcm = createCnxnManager();
            QuorumCnxManager.Listener listener = qcm.listener;
            if(listener != null){
                listener.start();
                le = new FastLeaderElection(this, qcm);
            } else {
                LOG.error("Null listener when initializing cnx manager");
            }

lookForLeader()做的第一件事就是 sendNotifications();


 private void sendNotifications() {
        for (QuorumServer server : self.getVotingView().values()) {
            long sid = server.id;

            ToSend notmsg = new ToSend(ToSend.mType.notification,
                    proposedLeader,
                    proposedZxid,
                    logicalclock.get(),
                    QuorumPeer.ServerState.LOOKING,
                    sid,
                    proposedEpoch);
          //  if(LOG.isDebugEnabled()){
                LOG.info("Sending Notification: " + proposedLeader + " (n.leader), 0x"  +
                      Long.toHexString(proposedZxid) + " (n.zxid), 0x" + Long.toHexString(logicalclock.get())  +
                      " (n.round), " + sid + " (recipient), " + self.getId() +
                      " (myid), 0x" + Long.toHexString(proposedEpoch) + " (n.peerEpoch)");
           // }
            sendqueue.offer(notmsg); 
        }
    }
其中 : self.getVotingView()来自下面
QuorumPeer
    public Map<Long,QuorumPeer.QuorumServer> getView() {
        return Collections.unmodifiableMap(this.quorumPeers);
    }

    /**
     * Observers are not contained in this view, only nodes with 
     * PeerType=PARTICIPANT.
     */
    public Map<Long,QuorumPeer.QuorumServer> getVotingView() {
        return QuorumPeer.viewToVotingView(getView());
    }

QuorumPeerMain
    quorumPeer.setQuorumPeers(config.getServers());

这里将自己的投票信息挨个发给自己集群中的每个节点

sendqueue.offer(notmsg);这个方法只是把要发送的数据放到队列中结合前面一篇文章我们知道这个消息其实最后会被
FastLeaderElection.Messenger.WorkerSender线程交给QuorumCnxManager的queueSendMap 同时进行连接

                       public void toSend(Long sid, ByteBuffer b) {
						        if (this.mySid == sid) {
						             b.position(0);
						             addToRecvQueue(new Message(b.duplicate(), sid));	  // recvQueue 存放收到的消息					            
						        } else {
						             ArrayBlockingQueue<ByteBuffer> bq = new ArrayBlockingQueue<ByteBuffer>(SEND_CAPACITY);
						             ArrayBlockingQueue<ByteBuffer> bqExisting = queueSendMap.putIfAbsent(sid, bq);   queueSendMap 存放发送的消息
						             if (bqExisting != null) {
						                 addToSendQueue(bqExisting, b);
						             } else {
						                 addToSendQueue(bq, b);
						             }
						             connectOne(sid);        
						        }

ps ：

这里还有一点需要主要在QuorumCnxManager.Listener 接收其他节点连接的时候会比较myid 如果小于自己的myid 那么它会主动断开这个连接因为默认在选举的时候会有比自己myid大的节点
可是在 connectOne(sid);方法里面刚好是反过来的
这样的最终结果就是在开始选选举的时候 sendNotifications() 时候只向myid比自己小的节点发送自己的选票信息，根据后面策略 myid大的有优先做为leaer的条件一开始就抢选票所以开始的时候 myid越小收到的选票越多处理的东西也越多反过来myid越大收到的选票也越少不过zk 选择leader不只看myid 还有其他的条件

connectOne(sid);        
  QuorumCnxManager.initiateConnection(Socket sock, Long sid)
     QuorumCnxManager.startConnection(Socket sock, Long sid)
     
     // If lost the challenge, then drop the new connection
        if (sid > this.mySid) {
            LOG.info("Have smaller server identifier, so dropping the " +
                     "connection: (" + sid + ", " + this.mySid + ")");
            closeSocket(sock);
            // Otherwise proceed with the connection
        } else {
            SendWorker sw = new SendWorker(sock, sid);
            RecvWorker rw = new RecvWorker(sock, din, sid, sw);
            sw.setRecv(rw);
            SendWorker vsw = senderWorkerMap.get(sid);     
            if(vsw != null)
                vsw.finish();      
            senderWorkerMap.put(sid, sw);
            queueSendMap.putIfAbsent(sid, new ArrayBlockingQueue<ByteBuffer>(SEND_CAPACITY));          
            sw.start();
            rw.start();       
            return true;           
        }

选票的处理
4.1 WorkerReceiver 的处理

这里要提一点与其他节点的数据交换都是通过
QuorumCnxManager 来完成的 FastLeaderElection的两个工作线程实际是将要发送的数据交给queueSendMap
从recvQueue 拿基础数据然后做一点处理变成自己能用的东西

QuorumCnxManager.SendWorker ====>queueSendMap;
QuorumCnxManager.RecvWorker ======>recvQueue;
public final ArrayBlockingQueue recvQueue;

从QuorumCnxManager的接收消息对列中获取消息

/*
                       * If it is from an observer, respond right away.
                       * Note that the following predicate assumes that
                       * if a server is not a follower, then it must be
                       * an observer. If we ever have any other type of
                       * learner in the future, we'll have to change the
                       * way we check for observers.
                       */
                      if(!self.getVotingView().containsKey(response.sid)){
                          Vote current = self.getCurrentVote();
                          ToSend notmsg = new ToSend(ToSend.mType.notification,
                                  current.getId(),
                                  current.getZxid(),
                                  logicalclock.get(),
                                  self.getPeerState(),
                                  response.sid,
                                  current.getPeerEpoch());

                          sendqueue.offer(notmsg);

这一步是比较有意思的因为在.lookForLeader()处理投票的方法中有这么一条所以上一条处理投票的代码用处估计只有一个
那就集群新增节点不过这样没有处理之前的配置文件好像不太健壮是不是通过jmx修改之前server的配置？

    if(self.getVotingView().containsKey(n.sid)) {...}
  else  
    LOG.warn("Ignoring notification from non-cluster member " + n.sid);

将buffer中的消息读取出来
根据消息的状态处理消息
1. 如果自己的状态是如果也为looking 放入 recvqueue（此时投票的状态可能是leading或者follower）
2. 如果自己的状态是如果也为looking 判断该消息状态如果也为lookig 同时逻辑时钟小于自己的则向该服务发送一条消息 leader为自己选举的leader（不一定是自己）相当于拉票了
3. 如果自己的状态不是looking状态请求的服务的状态是 looking 向该服务发送自己当前的投票信息

4.2 lookForLeader处理投票信息

FastLeaderElection.lookForLeader()
         HashMap<Long, Vote> recvset = new HashMap<Long, Vote>();  这里存放的是节点状态为looking的
         HashMap<Long, Vote> outofelection = new HashMap<Long, Vote>();这里存放的是leading following的

处理looking状态的投票
1. 当自己服务的状态为looking的时候
  Notification n = recvqueue.poll(notTimeout,TimeUnit.MILLISECONDS);会从recvqueue队列里拿消息
2. 根据集群内的server返回的消息进行处理如果不是集群内配置的服务直接跳过这个消息打印警告日志
  - 先与获取的信息进行比较（这里就开始决定选哪个了）
    - 如果自身的逻辑时钟较小则删队列中已经获取到的消息更新选票的信息然后发送notify消息
    - 如果自身的逻辑时钟较大则直接忽略该消息
    - 如果逻辑时钟一样比较信息然后发送notify消息
  - 将获取到的消息存recvset的Map中 sid->vote
3. 这里判断自己收都到的投票是否足够结束一轮投票这里两种策略不过我们一般都是使用票数过半作为条件
  然后返回最后的投票信息
  - 如果票数过半最后等待一段时间看投票信息是否有变化
  - 这里开始修改当前服务的状态
    在获取超过一般的服务器的数据后一般这个时候是可以确定自己可以作为什么角色
处理following和leading状态的投票
1. FOLLOWING LEADING
  是放在一个逻辑里处理的
  如果自己是leader 就做判断
  如果自己不是leader 或者只是新加入集群的一员就将消息放入
  outofelection进行验证同时返回自己最后的投票信息并更新自己的状态

zk-leader选举 流程描述

猜你喜欢

zk-leader选举流程描述