转载自:http://www.mamicode.com/info-detail-1243044.html 部分内容
一、问题
在web上查看namenode状态全是standby,从网上找了很多资料,最后确定是namenode访问journalnode重试次数过少导致,错误描述类似下面的语句:
2018-01-05 18:50:31,096 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: node2/10.211.55.5:8485. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
二、解决办法:
修改core-site.xml中的ipc参数
<property>
<name>ipc.client.connect.max.retries</name>
<value>100</value>
<description>Indicates the number of retries a client will make to establish
a server connection.
</description>
</property>
<property>
<name>ipc.client.connect.retry.interval</name>
<value>10000</value>
<description>Indicates the number of milliseconds a client will wait for
before retrying to establish a server connection.
</description>
</property>
三、手工切换ha状态
在hadoop bin目录下执行:hdfs haadmin -transitionToActive --forcemanual nn1
这一步虽然让node1短暂active,但过一段时间,又回到了standyby状态,查了很多资料,
执行了 hdfs zkfc -formatZK 命令,终于解决了两个namenode 全部为standby的问题