搭建好cdh平台之后,不知道误删了什么文件,导致添加Solr服务时在初始化阶段显示Solr initialize failed
,初始化失败,重新安装服务甚至多次重新安装cdh平台依然是这种状况。查看日志显示如下:
15/Sep/2018 18:52:53 +0000 org.apache.solr.common.cloud.ZkStateReader$3 process
WARNING: ZooKeeper watch triggered, but Solr cannot talk to ZK
15/Sep/2018 18:52:53 +0000 org.apache.solr.cloud.LeaderElector$1 process
WARNING:
org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired for /collections/test/leader_elect/slice3/election
at org.apache.zookeeper.KeeperException.create(KeeperException.java:118)
at org.apache.zookeeper.KeeperException.create(KeeperException.java:42)
at org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:1249)
at org.apache.solr.common.cloud.SolrZkClient$6.execute(SolrZkClient.java:266)
at org.apache.solr.common.cloud.SolrZkClient$6.execute(SolrZkClient.java:263)
at org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(ZkCmdExecutor.java:65)
at org.apache.solr.common.cloud.SolrZkClient.getChildren(SolrZkClient.java:263)
at org.apache.solr.cloud.LeaderElector.checkIfIamLeader(LeaderElector.java:92)
at org.apache.solr.cloud.LeaderElector.access$000(LeaderElector.java:57)
at org.apache.solr.cloud.LeaderElector$1.process(LeaderElector.java:121)
at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:531)
at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:507)
日志显示zookeeper的watch已经被触发,但是Solr和Zookeeper不能进行会话。根据Solr以下的相关源码,可以看出,Solr创建了ZkStateReader
实例,这个实例主要负责持有zk中的属性,并注册watcher。从源码中可以看到,警告日志信息ZooKeeper watch triggered, but Solr cannot talk to ZK
出现的条件是SESSIONEXPIRED
异常或CONNECTIONLOSS
异常,原因可能是会话过期或连接丢失,可以尝试加大zookeeper会话的超时时间。
synchronized (getUpdateLock()) {
cmdExecutor.ensureExists(CLUSTER_STATE, zkClient);
cmdExecutor.ensureExists(ALIASES, zkClient);
log.info("Updating cluster state from ZooKeeper... ");
zkClient.exists(CLUSTER_STATE, new Watcher() {
@Override
public void process(WatchedEvent event) {
// session events are not change events,
// and do not remove the watcher
if (EventType.None.equals(event.getType())) {
return;
}
log.info("A cluster state change: {}, has occurred - updating... (live nodes size: {})", (event) , ZkStateReader.this.clusterState == null ? 0 : ZkStateReader.this.clusterState.getLiveNodes().size());
try {
// delayed approach
// ZkStateReader.this.updateClusterState(false, false);
synchronized (ZkStateReader.this.getUpdateLock()) {
// remake watch
final Watcher thisWatch = this;
Stat stat = new Stat();
byte[] data = zkClient.getData(CLUSTER_STATE, thisWatch, stat ,
true);
Set<String> ln = ZkStateReader.this.clusterState.getLiveNodes();
ClusterState clusterState = ClusterState.load(stat.getVersion(), data, ln,ZkStateReader.this);
// update volatile
ZkStateReader.this.clusterState = clusterState;
}
} catch (KeeperException e) {
if (e.code() == KeeperException.Code.SESSIONEXPIRED
|| e.code() == KeeperException.Code.CONNECTIONLOSS) {
log.warn("ZooKeeper watch triggered, but Solr cannot talk to ZK");
return;
}
log.error("", e);
throw new ZooKeeperException(SolrException.ErrorCode.SERVER_ERROR,
"", e);
} catch (InterruptedException e) {
// Restore the interrupted status
Thread.currentThread().interrupt();
log.warn("", e);
return;
}
}
}, true);
}
以上源码是通过已经存在于zookeeper中的属性,对ZkStateReader进行初始化过程的一部分,因此我使用zkCli.sh
指令进入zookeeper的管理器,查看zookeeper中和Solr相关的属性,发现根目录下没有/Solr
目录。猜测是误删除了zookeeper中的某些文件夹,导致Solr初始化时无法在zookeeper创建/Solr目录并持有一些属性数据,从而导致Solr没有办法从zookeeper中获得相关属性,向zookeeper多次轮询最终会话超时,这种情况下加大会话时长并不起作用。于是最终选择将cdh彻底清除再重新进行安装,添加Solr服务时,zookeeper自动创建/Solr目录和生成一些属性值,Solr成功初始化。
cdh完全卸载步骤如下:
- 通过cdh的可视化管理页面关闭集群中的服务。
- 停止cloudera的相关服务。
server节点:
service cloudera-scm-server stop
agent节点:
service cloudera-scm-agent stop - 卸载安装包。
rpm -qa | grep cloudera
for f inrpm -qa | grep cloudera
; do rpm -e ${f} ; done (如果有保存,在执行一遍) - 清除已安装服务的相关目录。
umount /var/run/cloudera-scm-agent/process
rm -rf /usr/share/cmf /var/lib/cloudera* /var/cache/yum/x86_64/6/cloudera* /var/log/cloudera* /var/run/cloudera* /etc/cloudera* - 清除安装文件。
rm -rf /var/lib/hadoop-* /var/lib/impala /var/lib/solr /var/lib/zookeeper /var/lib/hue /var/lib/oozie /var/lib/pgsql /var/lib/sqoop2 /data/dfs/ /data/impala/ /data/yarn/ /dfs/ /impala/ /yarn/ /var/run/hadoop-/ /var/run/hdfs-/ /usr/bin/hadoop* /usr/bin/zookeeper* /usr/bin/hbase* /usr/bin/hive* /usr/bin/hdfs /usr/bin/mapred /usr/bin/yarn /usr/bin/sqoop* /usr/bin/oozie /etc/hadoop* /etc/zookeeper* /etc/hive* /etc/hue /etc/impala /etc/sqoop* /etc/oozie /etc/hbase* /etc/hcatalog
rm -rf ` find /var/lib/alternatives/* ! -name “mta” ! -name “print” ! -name “zlibrary-ui” -mtime -3`
rm -rf /etc/alternatives/* - 杀死监管进程。
ps aux|grep super
kill -9 pid(pid为上述指令回车后supervisord的进程id) - 删除parcel包分发文件和解压文件。
rm -rf /opt/cloudera/parcel-cache /opt/cloudera/parcels
完成以上步骤后,即可重新安装cdh平台。