zookeeper-network partition

env:

hbase,94.26

zookeeper,3.4.3

---------------

1.downed node

this morning we found a regionserver(host-34) downed in our monitor.so we dived into the logs of hbase and found that in this host:

2016-02-29 00:50:36,799 INFO [regionserver60020-SendThread(host-04:2181)] ClientCnxn.java:1083 Client session timed out, have not heard from server in 60030ms for sessionid 0x4511b9

that means during this period,the read timeout has been touched.

this timeout is settled by this:

  readtimeout=negotiatedtimeout(client[18000 by default in hbase],server[60000,90000]) * 2/3

so it's set to 60 secs.(negotiatedtimeout=90 secs)

and conn timeout:

conn timeout=negotiatedtimeout / host provider

we have five nodes of a quorum,so it's set to 90/5=18 secs.

after a few retries,(note:the start time of this connection was at 00:49:36)

host-34,hbase:
2016-02-29 00:50:36,799 INFO [regionserver60020-SendThread(host-04:2181)] ClientCnxn.java:1083 Client session timed out, have not heard from server in 60030ms for sessionid 0x4511b9

duration==
2016-02-29 00:50:44,540 INFO [regionserver60020-SendThread(host-07:2181)] ClientCnxn.java:966 Opening socket connection to server host-07/192.168.100.117:2181. Will not attempt to a
uthenticate using SASL (Unable to locate a login configuration)
…
2016-02-29 00:51:02,559 INFO [regionserver60020-SendThread(host-07:2181)] ClientCnxn.java:1083 Client session timed out, have not heard from server in **18790ms** for sessionid 0x4511b9
3876c000b, closing socket connection and attempting reconnect

(the period time 18 secs is computed by above connectedtimeout.)

a network paritition was shown here:

2016-02-29 00:50:40,499 WARN [regionserver60020-SendThread(host-05:2181)] ClientCnxn.java:1089 Session 0x4511b93876c000b for server null, unexpected error, closing socket connection
 and attempting reconnect
java.net.NoRouteToHostException: No route to host
	at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
	at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:567)
	at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:350)
	at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1068)

so then we dived into the last zk's log from the last request from hbase SendThread:

host-04,zookeeper:
2016-02-29 00:51:25,764 [myid:2] - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@213] - Accepted socket connection from /192.168.100.147:58100
2016-02-29 00:51:25,765 [myid:2] - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@831] - Client attempting to renew session 0x4511b93876c000b at /192.168.100.147:5
8100
2016-02-29 00:51:25,765 [myid:2] - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:Learner@107] - Revalidating client: 0x4511b93876c000b
2016-02-29 00:51:25,766 [myid:2] - INFO  [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:2181:ZooKeeperServer@597] - Invalid session 0x4511b93876c000b for client /192.168.100.147:58100, probabl
y expired
2016-02-29 00:51:25,766 [myid:2] - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1000] - Closed socket connection for client /192.168.100.147:58100 which had session
nid 0x4511b93876c000b

untill here,the time period is for :51:25-49:36 ~ 110 secs >> 90 secs(timeout set above),so this session timeout was settled to expired.

that means this session should be timeouted at this timestamp:

session touch time + negotiatedtimeout = expired timestamp
 = 49:36 + 90 secs ~ 51:06

for sure,we found that in this zookeeper one line at this timestamp was present in front of us:

host-05,zookeeper:
2016-02-29 00:51:08,000 [myid:3] - INFO  [SessionTracker:ZooKeeperServer@334] - Expiring session 0x4511b93876c000b, timeout of 90000ms exceeded

2.another node which reconnected successfully to zk during this timeout.

host-33,hbase:(so this hbase keep on running )
2016-02-29 00:50:55,864 INFO [regionserver60020-SendThread(host-03:2181)] ClientCnxn.java:1207 Session establishment complete on server host-03/192.168.100.113:2181, sessionid = 0x4
0x4511b93876c000b

it's printed at 50:55 << 51:06,so it keeps on running with happy.

3.conclusion

a.set the connect timeout the :one / hostproviderTH for all reconnect to all nodes in this quorum

b.as of case a,so the whole connected timeout + readtimeout will longer than the negotiatedtimeout:

 read time-out =negotiaedtimeout * 2/ 3
 read timeout + (negotiatedtimeout / host provider ) * Nr > negotiatedtimeout
 Nr = retried times

so in for one extreme cases:

if all nodes in zk's quorum are disconnectable,that will cause double time of negotiaredtimeout.

c.so based on b,why set the this mechanism ?

if zk switch this solution:(this means the whole read and connect timeout are equals to negotiatedtimeout)

connecttimeout= (negotiateedtimeout- readtimeout) /hostprovider

ie.

----------negotiatedtimeout------------------||

------read--timeout-------| retry1|retry2....||

if some cases similar to here,u will lost chance to know what's wrong with hbase/zk,instead of showing :

retry connecting but timeout ,retry connecting but timeout....than the rs was shutdowned forcely ,that is it.

that means if something weird (e.g. large gc time cost that exceeds the whole negotiatedtimeout) occurs,it's needless to retry to connect other nodes in a quorum.

ref:

zookeeper-network partition

zookeeper-negotiated session timeout

猜你喜欢