ceph-disk & ceph-osd启动流程(by quqi99)

版权声明:本文为博主原创文章,如需转载,请注明出处! https://blog.csdn.net/quqi99/article/details/81203771

版权声明:可以任意转载,转载时请务必以超链接形式标明文章原始出处和作者信息及本版权声明 (作者:张华 发表于:2018-07-25)

问题

客户重启物理机之后发现有一些OSD没有启动成功, 但手动运行ceph-disk命令(ceph-disk -v activate –mark-init systemd –mount /var/lib/ceph/osd/ceph-1)可以成功.从众多无关错误日志中提取到了如下有用日志, 很显然似乎在120秒的超时时间内没执行完:

May 22 06:05:19 cephosd06 systemd[1]: Starting Ceph disk activation: /dev/sdh1...
May 22 06:05:30 cephosd06 sh[3926]: main_trigger: main_activate: path = /dev/sdh1
May 22 06:05:30 cephosd06 sh[3926]: get_dm_uuid: get_dm_uuid /dev/sdh1 uuid path is /sys/dev/block/8:113/dm/uuid
May 22 06:05:30 cephosd06 sh[3926]: command: Running command: /sbin/blkid -o udev -p /dev/sdh1
May 22 06:05:30 cephosd06 sh[3926]: command: Running command: /sbin/blkid -p -s TYPE -o value -- /dev/sdh1
May 22 06:05:30 cephosd06 sh[3926]: mount: Mounting /dev/sdh1 on /var/lib/ceph/tmp/mnt.0xG2_W with options noatime,inode64
May 22 06:05:30 cephosd06 sh[3926]: command_check_call: Running command: /bin/mount -t xfs -o noatime,inode64 -- /dev/sdh1 /var/lib/ceph/tmp/mnt.0xG2_W
May 22 06:05:30 cephosd06 sh[3926]: command_check_call: Running command: /bin/mount -o noatime,inode64 -- /dev/sdh1 /var/lib/ceph/osd/ceph-45
May 22 06:05:30 cephosd06 ceph-osd[8052]: starting osd.45 at :/0 osd_data /var/lib/ceph/osd/ceph-45 /var/lib/ceph/osd/ceph-45/journal
May 22 06:05:31 cephosd06 sh[6944]: main_trigger: main_trigger: Namespace(cluster='ceph', dev='/dev/sdh1', dmcrypt=None, dmcrypt_key_dir='/etc/ceph/dmcrypt-keys', func=<function main_trigger at 0x7f574b3fc7d0>, log_stdout=True, prepend_to_path='/usr/bin', prog='ceph-disk', setgroup=None, setuser=None, statedir='/var/lib/ceph', sync=True, sysconfdir='/etc/ceph', verbose=True)
May 22 06:05:31 cephosd06 sh[6944]: command_check_call: Running command: /bin/chown ceph:ceph /dev/sdh1
May 22 06:05:31 cephosd06 sh[6944]: command: Running command: /sbin/blkid -o udev -p /dev/sdh1
May 22 06:05:31 cephosd06 sh[6944]: command: Running command: /sbin/blkid -o udev -p /dev/sdh1
May 22 06:05:31 cephosd06 sh[6944]: main_trigger: trigger /dev/sdh1 parttype 4fbd7e29-9d25-41b8-afd0-062c0ceff05d uuid ff8f7341-1c1e-4912-b680-41fd6999fcc8
May 22 06:05:31 cephosd06 sh[6944]: command: Running command: /usr/sbin/ceph-disk --verbose activate /dev/sdh1
May 22 06:07:20 cephosd06 systemd[1]: [email protected]: Main process exited, code=exited, status=124/n/a
May 22 06:07:20 cephosd06 systemd[1]: Failed to start Ceph disk activation: /dev/sdh1.

原因

个原因造成或加剧此问题:

vi systemd/ceph-disk@.service
ExecStart=/bin/sh -c 'timeout 120 flock /var/lock/ceph-disk-$(basename %f) /usr/sbin/ceph-disk --verbose --l

vi systemd/ceph-osd@.service
ExecStart=/usr/bin/ceph-osd -f --cluster ${CLUSTER} --id %i --setuser ceph --setgroup ceph
ExecStartPre=/usr/lib/ceph/ceph-osd-prestart.sh --cluster ${CLUSTER} --id %i
Restart=on-failure
StartLimitInterval=30min
StartLimitBurst=30
RestartSec=20s

ceph-osd触发ceph-disk流程

1, First ceph will create journal partition with the typecode 45b0969e-9b03-4f30-b4c6-b4b80ceff106, 

ceph-osd --cluster=ceph --show-config-value=osd_journal_size
uuid=$(uuidgen)
num=2
sgdisk --new=${num}:0:+128M --change-name=${num}:"ceph journal" --partition-guid=${num}:${uuid} --typecode=${num}:45b0969e-9b03-4f30-b4c6-b4b80ceff106 --mbrtogpt -- /dev/loop0
udevadm settle --timeout=600
flock -s /dev/loop0 partprobe /dev/loop0
udevadm settle --timeout=600

2, partx/partprobe command will be called to update partition after running sgdisk to create partition, so partprobe will send udev event to udev daemon

3, udev daemon will call '/usr/sbin/ceph-disk --log-stdout -v trigger /dev/$name' after receiving udev event created by partprobe according to the following udev rules:

./udev/95-ceph-osd.rules
11 # JOURNAL_UUID
12 ACTION=="add", SUBSYSTEM=="block", \
13 ENV{DEVTYPE}=="partition", \
14 ENV{ID_PART_ENTRY_TYPE}=="45b0969e-9b03-4f30-b4c6-b4b80ceff106", \
15 OWNER:="ceph", GROUP:="ceph", MODE:="660", \
16 RUN+="/usr/sbin/ceph-disk --log-stdout -v trigger /dev/$name"
17 ACTION=="change", SUBSYSTEM=="block", \
18 ENV{ID_PART_ENTRY_TYPE}=="45b0969e-9b03-4f30-b4c6-b4b80ceff106", \
19 OWNER="ceph", GROUP="ceph", MODE="660"

./src/ceph-disk-udev
29 45b0969e-9b03-4f30-b4c6-b4b80ceff106)
30 # JOURNAL_UUID
31 # activate ceph-tagged journal partitions.
32 /usr/sbin/ceph-disk -v activate-journal /dev/${NAME}
33 ;;

4, so the device /dev/disk/by-partuuid/9195fa44-68ba-49f3-99f7-80d9bcb50430 will be created

5, Then the uuid of journal partition will be writed into the file /var/lib/ceph/osd/ceph-1/journal_uuid, and a soft link is linked into /var/lib/ceph/osd/ceph-1/journal

root@juju-332891-mitaka-ceph-0:~# ll /var/lib/ceph/osd/ceph-1/journal
lrwxrwxrwx 1 ceph ceph 58 Jun 1 02:46 /var/lib/ceph/osd/ceph-1/journal -> /dev/disk/by-partuuid/9195fa44-68ba-49f3-99f7-80d9bcb50430
root@juju-332891-mitaka-ceph-0:~# cat /var/lib/ceph/osd/ceph-1/journal_uuid
9195fa44-68ba-49f3-99f7-80d9bcb50430

伪码描述ceph-disk的执行流程

1, Prepare test disk
dd if=/dev/zero of=test.img bs=1M count=8096 oflag=direct
#sudo losetup -d /dev/loop0
sudo losetup --show -f test.img
sudo ceph-disk -v prepare --zap-disk --cluster ceph --fs-type xfs -- /dev/loop0

2, Clear the partition
parted --machine -- /dev/loop0 print
sgdisk --zap-all -- /dev/loop0
sgdisk --clear --mbrtogpt -- /dev/loop0
udevadm settle --timeout=600
flock -s /dev/loop0 partprobe /dev/loop0
udevadm settle --timeout=600

3, Create journal partition
ceph-osd --cluster=ceph --show-config-value=osd_journal_size
uuid=$(uuidgen)
num=2
sgdisk --new=${num}:0:+128M --change-name=${num}:"ceph journal" --partition-guid=${num}:${uuid} --typecode=${num}:45b0969e-9b03-4f30-b4c6-b4b80ceff106 --mbrtogpt -- /dev/loop0
udevadm settle --timeout=600
flock -s /dev/loop0 partprobe /dev/loop0
udevadm settle --timeout=600

4, Create data partition
uuid=$(uuidgen)
sgdisk --largest-new=1 --change-name=1:"ceph data" --partition-guid=1:${uuid} --typecode=1:89c57f98-2fe5-4dc0-89c1-f3ad0ceff2be --mbrtogpt -- /dev/loop0
udevadm settle --timeout=600
flock -s /dev/loop0 partprobe /dev/loop0
udevadm settle --timeout=600

5, Format data partition
parted --machine -- /dev/loop0 print
mkfs -t xfs -f -i size=2048 -- /dev/loop0p1

6, All default mount attributes should be empty
ceph-conf --cluster=ceph --name=osd. --lookup osd_mkfs_options_xfs
ceph-conf --cluster=ceph --name=osd. --lookup osd_fs_mkfs_options_xfs
ceph-conf --cluster=ceph --name=osd. --lookup osd_mount_options_xfs
ceph-conf --cluster=ceph --name=osd. --lookup osd_fs_mount_options_xfs

7, Mount tmp directory - https://github.com/ceph/ceph/blob/jewel/src/ceph-disk/ceph_disk/main.py#L3169
mkdir /var/lib/ceph/tmp/mnt.uCrLyH
mount -t xfs -o noatime,inode64 -- /dev/loop0p1 /var/lib/ceph/tmp/mnt.uCrLyH
restorecon /var/lib/ceph/tmp/mnt.uCrLyH
cat /proc/mounts

8, Activate - https://github.com/ceph/ceph/blob/jewel/src/ceph-disk/ceph_disk/main.py#L3192
#Get fsid and write fsid to tmp file ceph_fsid by using active function
fsid=$(ceph-osd --cluster=ceph --show-config-value=fsid)
cat << EOF > /var/lib/ceph/tmp/mnt.uCrLyH/ceph_fsid
$fsid
EOF
restorecon -R /var/lib/ceph/tmp/mnt.uCrLyH/ceph_fsid
chown -R ceph:ceph /var/lib/ceph/tmp/mnt.uCrLyH/ceph_fsid

#Get osd_uuid and write it to the tmp file
osd_uuid=$(uuidgen)
cat << EOF > /var/lib/ceph/tmp/mnt.uCrLyH/fsid
$osd_uuid
EOF
restorecon -R /var/lib/ceph/tmp/mnt.uCrLyH/fsid
chown -R ceph:ceph /var/lib/ceph/tmp/mnt.uCrLyH/fsid

#Write magic to the tmp file
cat << EOF > /var/lib/ceph/tmp/mnt.uCrLyH/magic
ceph osd volume v026
EOF
restorecon -R /var/lib/ceph/tmp/mnt.uCrLyH/magic
chown -R ceph:ceph /var/lib/ceph/tmp/mnt.uCrLyH/magic

#Get journal_uuid and write it to the tmp file
journal_uuid # Get it by 'll /dev/disk/by-partuuid/ | grep loop0p2'
cat << EOF > /var/lib/ceph/tmp/mnt.uCrLyH/journal_uuid
$journal_uuid
EOF
restorecon -R /var/lib/ceph/tmp/mnt.uCrLyH/journal_uuid
chown -R ceph:ceph /var/lib/ceph/tmp/mnt.uCrLyH/journal_uuid

#Create journal link
ln -s /dev/disk/by-partuuid/f15b0bc2-8462-44c3-83f3-275646923f4a /var/lib/ceph/tmp/mnt.uCrLyH/journal

#Retore file security for tmp directory
restorecon -R /var/lib/ceph/tmp/mnt.uCrLyH
chown -R ceph:ceph /var/lib/ceph/tmp/mnt.uCrLyH

#Umount tmp directory
umount -- /var/lib/ceph/tmp/mnt.uCrLyH
rm -rf /var/lib/ceph/tmp/mnt.uCrLyH

#Modify the typecode of OSD to 4fbd7e29-9d25-41b8-afd0-062c0ceff05d, which means READY
sgdisk --typecode=1:4fbd7e29-9d25-41b8-afd0-062c0ceff05d -- /dev/loop0
udevadm settle --timeout=600
flock -s /dev/loop0 partprobe /dev/loop0
udevadm settle --timeout=600
udevadm trigger --action=add --sysname-match loop0

9, Start OSD daemon - https://github.com/ceph/ceph/blob/jewel/src/ceph-disk/ceph_disk/main.py#L3471
#ceph-disk -v activate --mark-init systemd --mount /dev/loop0
blkid -p -s TYPE -o value -- /dev/loop0
ceph-conf --cluster=ceph --name=osd. --lookup osd_fs_mount_options_xfs
mkdir /var/lib/ceph/tmp/mnt.GoeBOu
mount -t xfs -o noatime,inode64 -- /dev/loop0 /var/lib/ceph/tmp/mnt.GoeBOu
restorecon /var/lib/ceph/tmp/mnt.GoeBOu
umount -- /var/lib/ceph/tmp/mnt.GoeBOu
rm -rf /var/lib/ceph/tmp/mnt.GoeBOu
systemctl disable ceph-osd@3
systemctl enable --runtime ceph-osd@3
systemctl start ceph-osd@3

Reference

[1] https://bugs.launchpad.net/charm-ceph-osd/+bug/1783113

猜你喜欢

转载自blog.csdn.net/quqi99/article/details/81203771