文章目录
前言
一:使用ceph命令报错:.handle_connect_reply connect got BADAUTHORIZER
1.1:报错详情
-
我查看osd状态(ceph osd status)发现出现以下错误:
-
[root@ct ~(keystone_admin)]# ceph osd status 2020-03-12 18:09:43.363 7f2e96572700 0 -- 192.168.11.100:0/3068442569 >> 192.168.11.100:6804/1625 conn(0x7f2e80005580 :-1 s=STATE_CONNECTING_WAIT_CONNECT_REPLY_AUTH pgs=0 cs=0 l=1).handle_connect_reply connect got BADAUTHORIZER 2020-03-12 18:09:43.564 7f2e96572700 0 -- 192.168.11.100:0/3068442569 >> 192.168.11.100:6804/1625 conn(0x7f2e80005580 :-1 s=STATE_CONNECTING_WAIT_CONNECT_REPLY_AUTH pgs=0 cs=0 l=1).handle_connect_reply connect got BADAUTHORIZER 2020-03-12 18:09:43.965 7f2e96572700 0 -- 192.168.11.100:0/3068442569 >> 192.168.11.100:6804/1625 conn(0x7f2e80005580 :-1 s=STATE_CONNECTING_WAIT_CONNECT_REPLY_AUTH pgs=0 cs=0 l=1).handle_connect_reply connect got BADAUTHORIZER 2020-03-12 18:09:44.767 7f2e96572700 0 -- 192.168.11.100:0/3068442569 >> 192.168.11.100:6804/1625 conn(0x7f2e80005580 :-1 s=STATE_CONNECTING_WAIT_CONNECT_REPLY_AUTH pgs=0 cs=0 l=1).handle_connect_reply connect got BADAUTHORIZER 2020-03-12 18:09:46.370 7f2e96572700 0 -- 192.168.11.100:0/3068442569 >> 192.168.11.100:6804/1625 conn(0x7f2e80005580 :-1 s=STATE_CONNECTING_WAIT_CONNECT_REPLY_AUTH pgs=0 cs=0 l=1).handle_connect_reply connect got BADAUTHORIZER 2020-03-12 18:09:49.574 7f2e96572700 0 -- 192.168.11.100:0/3068442569 >> 192.168.11.100:6804/1625 conn(0x7f2e80005580 :-1 s=STATE_CONNECTING_WAIT_CONNECT_REPLY_AUTH pgs=0 cs=0 l=1).handle_connect_reply connect got BADAUTHORIZER 。。。会一直出现这个
1.2:解决
- 最开始重启ceph-osd服务发现没用,需要重启ceph服务才可以
systemctl restart ceph.target
1.3:问题解决!
二:CEPH某个节点的osd总是起不来
2.1:报错详情
-
CEPH集群查看健康状态的时候发现有一个节点的osd服务down了,使用
ceph osd status
命令发现是c1节点的服务没有起来 -
[root@ct ~(keystone_admin)]# ceph osd status +----+------+-------+-------+--------+---------+--------+---------+----------------+ | id | host | used | avail | wr ops | wr data | rd ops | rd data | state | +----+------+-------+-------+--------+---------+--------+---------+----------------+ | 0 | ct | 14.4G | 1009G | 0 | 0 | 0 | 6 | exists,up | | 1 | | 0 | 0 | 0 | 0 | 0 | 0 | autoout,exists | | 2 | c2 | 14.4G | 1009G | 0 | 0 | 1 | 48 | exists,up | +----+------+-------+-------+--------+---------+--------+---------+----------------+
2.2:解决
-
再次检查健康状态,终于发现问题:因为c1节点的时间同步出现了问题
-
[root@ct ~(keystone_admin)]# ceph -s cluster: id: 8c9d2d27-492b-48a4-beb6-7de453cf45d6 health: HEALTH_WARN Degraded data redundancy: 2127/6381 objects degraded (33.333%), 133 pgs degraded, 192 pgs undersized clock skew detected on mon.c1 '//显示c1节点时间有问题' services: mon: 3 daemons, quorum ct,c1,c2 mgr: ct(active), standbys: c2, c1 osd: 3 osds: 2 up, 2 in data: pools: 3 pools, 192 pgs objects: 2.13 k objects, 13 GiB usage: 29 GiB used, 2.0 TiB / 2.0 TiB avail pgs: 2127/6381 objects degraded (33.333%) 133 active+undersized+degraded 59 active+undersized
-
c1节点重新进行时间同步,并重启相关服务即可
-
[root@c1 ~]# ntpdate ct '//同步ct的时间' 12 Mar 18:23:27 ntpdate[37287]: step time server 192.168.11.100 offset -28799.645303 sec [root@c1 ~]# date '//再次检查时间是否相同' 2020年 03月 12日 星期四 18:23:33 CST [root@c1 ~]# systemctl restart ceph-osd.target '//重启osd服务'
-
再次检查健康状态,问题已经解决
[root@ct ~(keystone_admin)]# ceph -s cluster: id: 8c9d2d27-492b-48a4-beb6-7de453cf45d6 health: HEALTH_OK services: mon: 3 daemons, quorum ct,c1,c2 mgr: ct(active), standbys: c2 osd: 3 osds: 3 up, 3 in data: pools: 3 pools, 192 pgs objects: 2.13 k objects, 13 GiB usage: 43 GiB used, 3.0 TiB / 3.0 TiB avail pgs: 192 active+clean