1、检查发现硬盘故障
# lsvg -l rootvg
rootvg:
LV NAME TYPE LPs PPs PVs LV STATE MOUNT POINT
hd5 boot 1 2 2 closed/stale N/A
hd6 paging 64 128 2 open/syncd N/A
hd8 jfs2log 1 2 2 open/stale N/A
hd4 jfs2 8 16 2 open/stale /
hd2 jfs2 23 46 2 open/stale /usr
hd9var jfs2 16 32 2 open/stale /var
hd3 jfs2 16 32 2 open/stale /tmp
hd1 jfs2 8 16 2 open/stale /home
hd10opt jfs2 8 16 2 open/stale /opt
hd11admin jfs2 1 2 2 open/syncd /admin
lg_dumplv sysdump 12 12 1 open/syncd N/A
livedump jfs2 1 2 2 open/syncd /var/adm/ras/livedump
lvu01 jfs2 240 480 2 open/stale /u01
# lsvg -p rootvg
rootvg:
PV_NAME PV STATE TOTAL PPs FREE PPs FREE DISTRIBUTION
hdisk0 active 546 147 79..00..00..00..68
hdisk1 missing 546 159 91..00..00..00..68
# errpt -dH | more
IDENTIFIER TIMESTAMP T C RESOURCE_NAME DESCRIPTION
12296806 0402164821 T H sissas0 SAS ERROR
12296806 0402164821 T H sissas0 SAS ERROR
12296806 0402164721 T H sissas0 SAS ERROR
12296806 0402164721 T H sissas0 SAS ERROR
12296806 0402164621 T H sissas0 SAS ERROR
12296806 0402164621 T H sissas0 SAS ERROR
12296806 0402164521 T H sissas0 SAS ERROR
12296806 0402164521 T H sissas0 SAS ERROR
12296806 0402164421 T H sissas0 SAS ERROR
12296806 0402164421 T H sissas0 SAS ERROR
C62E1EB7 0402164321 P H hdisk1 DISK OPERATION ERROR
2、查看故障硬盘的FRU备件号
# lscfg -vpl hdisk1
hdisk1 U78A0.001.DNWHCRL-P2-D3 SAS Disk Drive (146800 MB)
Manufacturer................IBM
Machine Type and Model......ST3146356SS
FRU Number..................10N7204
ROS Level and ID............45363044
Serial Number...............3QN1SPCR
EC Level....................D76038
Part Number.................10N7203
Device Specific.(Z0)........000005329F001002
Device Specific.(Z1)........0709E60D
Device Specific.(Z2)........0021
Device Specific.(Z3)........09190
Device Specific.(Z4)........
Device Specific.(Z5)........22
Device Specific.(Z6)........D76038
Hardware Location Code......U78A0.001.DNWHCRL-P2-D3
PLATFORM SPECIFIC
Name: disk
Node: disk
Device Type: block
#
3、将hdisk1硬盘从rootvg中unmirrorvg
#unmirrorvg rootvg hdisk1
4、清除掉hdisk1中的boot记录
#chpv -c hdisk1
5、从rootv移除hdisk1硬盘
#reducevg rootvg hdisk1
如果从rootvg移除hdisk,有告警提升,错误代码0516-016或0516-884,这是因为lg_dumplv在rootvg缺省情况下没有做镜像,
只创建在hdisk0上,需要修改主dump目录为null(空目录)
# lslv -l lg_dumplv
lg_dumplv:N/A
PV COPIES IN BAND DISTRIBUTION
hdisk0 012:000:000 100% 000:012:000:000:000
#sysdumpdev -P -p /dev/sysdumpnull
本次更换hdisk1未操作此修改dump为空。
6、确认hdisk1是否已从rootvg中移除
#lsvg -l rootvg
7、删除hdisk1硬盘
#rmdev -Rdl hdisk1
8、拔出hdisk1故障硬盘,插入新硬盘(注意:特别注意设备SN、硬盘槽位,切勿找错服务器或拔错硬盘)
设备SN可以通过#prtconf | more来确认
# lscfg -vpl hdisk1 来确认硬盘位置在P2-D3
hdisk1 U78A0.001.DNWHCRL-P2-D3 SAS Disk Drive (146800 MB)
9、扫描识别新硬盘
#cfgmgr -v
10、检查识别到的新硬盘
#lspv
如果磁盘无pvid,需要执行chdev -l hdisk1 -a pv=yes
#lscfg -vpl hdisk1确认FRU是否一致
11、添加新硬盘hdisk1到rootvg
#extendvg rootvg hdisk1
12、镜像
#mirrorvg rootvg hdisk1
13、添加启动项
#bosboot -ad hdisk1
#bootlist -m normal -o
#bootlist -m normal -m normal hdisk0 hdisk1 cd0