2015-02-13 05:40:04,325 WARN [regionserver60020] zookeeper.RecoverableZooKeeper: Node /hbase/rs/slave2,60020,1423777199540 already deleted, retry=false 2015-02-13 05:40:04,325 WARN [regionserver60020] regionserver.HRegionServer: Failed deleting my ephemeral node org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode for /hbase/rs/slave2,60020,1423777199540 at org.apache.zookeeper.KeeperException.create(KeeperException.java:111) at org.apache.zookeeper.KeeperException.create(KeeperException.java:51) at org.apache.zookeeper.ZooKeeper.delete(ZooKeeper.java:873) at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.delete(RecoverableZooKeeper.java:179) at org.apache.hadoop.hbase.zookeeper.ZKUtil.deleteNode(ZKUtil.java:1273) at org.apache.hadoop.hbase.zookeeper.ZKUtil.deleteNode(ZKUtil.java:1262) at org.apache.hadoop.hbase.regionserver.HRegionServer.deleteMyEphemeralNode(HRegionServer.java:1342) at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:1054) at java.lang.Thread.run(Thread.java:745) 2015-02-13 05:40:04,329 INFO [regionserver60020-EventThread] zookeeper.ClientCnxn: EventThread shut down 2015-02-13 05:40:04,329 INFO [regionserver60020] zookeeper.ZooKeeper: Session: 0x14b7113ebc50012 closed 2015-02-13 05:40:04,329 INFO [regionserver60020] regionserver.HRegionServer: stopping server null; zookeeper connection closed. 2015-02-13 05:40:04,330 INFO [regionserver60020] regionserver.HRegionServer: regionserver60020 exiting
找了半天问题任然没有解决,无头绪中。。。。
喝杯茶,继续往上翻,突然发现救命稻草:
2015-02-13 05:40:04,294 FATAL [regionserver60020] [color=red]regionserver.HRegionServer: Master rejected startup because clock is out of sync org.apache.hadoop.hbase.ClockOutOfSyncException: [/color]org.apache.hadoop.hbase.ClockOutOfSyncException: Server slave2,60020,1423777199540 has been rejected; Reported time is too far out of sync with master. Time difference of 71419ms > max allowed of 30000ms at org.apache.hadoop.hbase.master.ServerManager.checkClockSkew(ServerManager.java:345) at org.apache.hadoop.hbase.master.ServerManager.regionServerStartup(ServerManager.java:238) at org.apache.hadoop.hbase.master.HMaster.regionServerStartup(HMaster.java:1294) at org.apache.hadoop.hbase.protobuf.generated.RegionServerStatusProtos$RegionServerStatusService$2.callBlockingMethod(RegionServerStatusProtos.java:7910) at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2027) at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:108) at org.apache.hadoop.hbase.ipc.FifoRpcScheduler$1.run(FifoRpcScheduler.java:74) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745)
问题找到了,由于是服务器Master的时间和RegionServer的时间不一致,没有装时间同步服务,导致此问题发生。
手动修改下RegionServer的时间 data -s 时间 ,重启RegionServer问题解决。
下一步需要在测试环境也安装NTP服务。