主要修改了server端的tcpnodelay为false,就解决这个问题了。
tcpnodelay为false时的tcpdump为(有40ms的延迟)
16:17:00.576582 IP linux-idy0.48538 > linux-idy0.60020: P 5434:5643(209) ack 247472 win 387 <nop,nop,timestamp 567208394 567208393>
16:17:00.576869 IP linux-idy0.60020 > linux-idy0.48538: P 247472:255664(8192) ack 5643 win 6149 <nop,nop,timestamp 567208394 567208394>
16:17:00.616084 IP linux-idy0.48538 > linux-idy0.60020: . ack 255664 win 387 <nop,nop,timestamp 567208404 567208394>
16:17:00.616105 IP linux-idy0.60020 > linux-idy0.48538: P 255664:256941(1277) ack 5643 win 6149 <nop,nop,timestamp 567208404 567208404>
16:17:00.616199 IP linux-idy0.48538 > linux-idy0.60020: . ack 256941 win 387 <nop,nop,timestamp 567208404 567208404>
tcpnodelay为true时的tcpdump为
11:18:05.513811 IP linux-idy0.12741 > linux-idy0.60020: P 31350:31559(209) ack 1420351 win 2314 <nop,nop,timestamp 562724628 562724627>
11:18:05.514037 IP linux-idy0.60020 > linux-idy0.12741: P 1420351:1428543(8192) ack 31559 win 6145 <nop,nop,timestamp 562724628 562724628>
11:18:05.514067 IP linux-idy0.60020 > linux-idy0.12741: P 1428543:1429820(1277) ack 31559 win 6145 <nop,nop,timestamp 562724628 562724628>
11:18:05.514137 IP linux-idy0.12741 > linux-idy0.60020: . ack 1429820 win 2303 <nop,nop,timestamp 562724628 562724628>
连接别的hbase的server时的tcpdump为(tcpnodelay为false)
11:28:46.280278 IP linux-idy0.60201 > linux-kl9e.60020: P 28006:28215(209) ack 1268847 win 1437 <nop,nop,timestamp 562884820 562802550>
11:28:46.280634 IP linux-kl9e.60020 > linux-idy0.60201: P 1268847:1277039(8192) ack 28215 win 3077 <nop,nop,timestamp 562802551 562884820>
11:28:46.280647 IP linux-idy0.60201 > linux-kl9e.60020: . ack 1277039 win 1437 <nop,nop,timestamp 562884820 562802551>
11:28:46.280740 IP linux-kl9e.60020 > linux-idy0.60201: P 1277039:1278316(1277) ack 28215 win 3077 <nop,nop,timestamp 562802551 562884820>
可以看出当使用本机的loopback时,有ack延迟的问题。但通过eth0连接别的机子却没有这个问题,比较费解。
使用了ethtool -K lo gro on/off开关合并ACK,也没有用。
参考:
http://www.iteye.com/topic/1110883
http://blog.csdn.net/historyasamirror/article/details/6423235
http://blog.csdn.net/wjtxt/article/details/6606022
http://www.cnblogs.com/yxfqust/archive/2006/07/28/461836.html
http://blog.163.com/xychenbaihu@yeah/blog/static/132229655201231214038740/
可能是因为lo和eth0的MT值不同而引起的,参考