今天早上一套rac的的一个节点的服务器宕机
服务器起来后登陆上去查看集群和数据库的启动情况
查看crs是否正在启动,有进程表示在启动中或已起好
# ps -ef|grep d.bin
root 7528 1 2 09:35 ? 00:00:01 /grid/app/11.2.0/grid/bin/ohasd.bin reboot
grid 7916 1 0 09:35 ? 00:00:00 /grid/app/11.2.0/grid/bin/oraagent.bin
grid 7928 1 0 09:35 ? 00:00:00 /grid/app/11.2.0/grid/bin/mdnsd.bin
grid 7940 1 0 09:35 ? 00:00:00 /grid/app/11.2.0/grid/bin/gpnpd.bin
grid 7954 1 0 09:35 ? 00:00:00 /grid/app/11.2.0/grid/bin/gipcd.bin
root 7955 1 0 09:35 ? 00:00:00 /grid/app/11.2.0/grid/bin/orarootagent.bin
root 7971 1 5 09:35 ? 00:00:02 /grid/app/11.2.0/grid/bin/osysmond.bin
root 7992 1 0 09:35 ? 00:00:00 /grid/app/11.2.0/grid/bin/cssdmonitor
root 8011 1 0 09:35 ? 00:00:00 /grid/app/11.2.0/grid/bin/cssdagent
grid 8022 1 0 09:35 ? 00:00:00 /grid/app/11.2.0/grid/bin/ocssd.bin
root 8630 7235 0 09:36 pts/0 00:00:00 grep --color=auto d.bin
查看集群启动的阶段
# /grid/app/11.2.0/grid/bin/crsctl stat res -t -init
-------------------------------------------------------------
NAME TARGET STATE SERVER STATE_DETAILS
-------------------------------------------------------------
Cluster Resources
-------------------------------------------------------------
ora.asm
1 ONLINE OFFLINE
ora.cluster_interconnect.haip
1 ONLINE OFFLINE
ora.crf
1 ONLINE ONLINE xsdbd31
ora.crsd
1 ONLINE OFFLINE
ora.cssd
1 ONLINE OFFLINE STARTING
ora.cssdmonitor
1 ONLINE ONLINE xsdbd31
ora.ctssd
1 ONLINE OFFLINE
ora.diskmon
1 OFFLINE OFFLINE
ora.evmd
1 ONLINE OFFLINE
ora.gipcd
1 ONLINE ONLINE xsdbd31
ora.gpnpd
1 ONLINE ONLINE xsdbd31
ora.mdnsd
1 ONLINE ONLINE xsdbd31
集群启动到了css节点,cssd处于starting状态
正常来说,集群启动也是需要时间的,一般等待几分钟就集群就启动到下一个阶段
但是今天就出现了问题
等待了几分钟ora.cssd仍然没有起来,查看css日志(11g:$ORACLE_HOME/log/nodex/nodex/cssd/ocssd.log)
$ tail -f ocssd.log
2018-10-17 09:41:38.087: [ CSSD][2186336000]clssscSelect: cookie accept request 0x13bbad0
2018-10-17 09:41:38.087: [ CSSD][2186336000]clssgmAllocProc: (0x7f1c6c085110) allocated
2018-10-17 09:41:38.088: [ CSSD][2186336000]clssgmClientConnectMsg: properties of cmProc 0x7f1c6c085110 - 1,2,3,4,5
2018-10-17 09:41:38.088: [ CSSD][2186336000]clssgmClientConnectMsg: Connect from con(0x38a1) proc(0x7f1c6c085110) pid(7954) version 11:2:1:4, properties: 1,2,3,4,5
2018-10-17 09:41:38.088: [ CSSD][2186336000]clssgmClientConnectMsg: msg flags 0x0000
2018-10-17 09:41:38.320: [ CSSD][2186336000]clssscSelect: cookie accept request 0x7f1c6c06ecf0
2018-10-17 09:41:38.320: [ CSSD][2186336000]clssscevtypSHRCON: getting client with cmproc 0x7f1c6c06ecf0
2018-10-17 09:41:38.320: [ CSSD][2186336000]clssgmRegisterClient: proc(3/0x7f1c6c06ecf0), client(341/0x7f1c6c083900)
2018-10-17 09:41:38.320: [ CSSD][2186336000]clssgmExecuteClientRequest(): type(6) size(684) only connect and exit messages are allowed before lease acquisition proc(0x7f1c6c)
2018-10-17 09:41:38.320: [ CSSD][2186336000]clssgmDiscEndpcl: gipcDestroy 0x38c7
2018-10-17 09:41:39.321: [ CSSD][2186336000]clssscSelect: cookie accept request 0x7f1c6c06ecf0
2018-10-17 09:41:39.321: [ CSSD][2186336000]clssscevtypSHRCON: getting client with cmproc 0x7f1c6c06ecf0
2018-10-17 09:41:39.321: [ CSSD][2186336000]clssgmRegisterClient: proc(3/0x7f1c6c06ecf0), client(342/0x7f1c6c083420)
2018-10-17 09:41:39.321: [ CSSD][2186336000]clssgmExecuteClientRequest(): type(6) size(684) only connect and exit messages are allowed before lease acquisition proc(0x7f1c6c)
2018-10-17 09:41:39.321: [ CSSD][2186336000]clssgmDiscEndpcl: gipcDestroy 0x38dd
2018-10-17 09:41:39.663: [ CSSD][2186336000]clssgmExecuteClientRequest(): type(37) size(80) only connect and exit messages are allowed before lease acquisition proc(0x7f1c6c)
2018-10-17 09:41:39.663: [ CSSD][2186336000]clssgmDeadProc: proc 0x7f1c6c085110
2018-10-17 09:41:39.663: [ CSSD][2186336000]clssgmDestroyProc: cleaning up proc(0x7f1c6c085110) con(0x38a1) skgpid ospid 7954 with 0 clients, refcount 0
2018-10-17 09:41:39.663: [ CSSD][2186336000]clssgmDiscEndpcl: gipcDestroy 0x38a1
2018-10-17 09:41:40.323: [ CSSD][2186336000]clssscSelect: cookie accept request 0x7f1c6c06ecf0
2018-10-17 09:41:40.323: [ CSSD][2186336000]clssscevtypSHRCON: getting client with cmproc 0x7f1c6c06ecf0
2018-10-17 09:41:40.323: [ CSSD][2186336000]clssgmRegisterClient: proc(3/0x7f1c6c06ecf0), client(343/0x7f1c6c083010)
2018-10-17 09:41:40.323: [ CSSD][2186336000]clssgmExecuteClientRequest(): type(6) size(684) only connect and exit messages are allowed before lease acquisition proc(0x7f1c6c)
2018-10-17 09:41:40.323: [ CSSD][2186336000]clssgmDiscEndpcl: gipcDestroy 0x3901
2018-10-17 09:41:40.841: [ GPNP][2183792384]clsgpnp_profileCallUrlInt: [at clsgpnp.c:2104] get-profile call to url "ipc://GPNPD_xsdbd31" disco "" [f=0 claimed- host: cname:
2018-10-17 09:41:40.848: [ GPNP][2183792384]clsgpnp_profileCallUrlInt: [at clsgpnp.c:2234] Result: (0) CLSGPNP_OK. Successful get-profile CALL to remote "ipc://GPNPD_xsdbd31"
2018-10-17 09:41:40.848: [ CSSD][2183792384]clssnmReadDiscoveryProfile: voting file discovery string(/dev/raw/raw*)
2018-10-17 09:41:40.848: [ CSSD][2183792384]clssnmvDDiscThread: using discovery string /dev/raw/raw* for initial discovery
2018-10-17 09:41:40.848: [ SKGFD][2183792384]Discovery with str:/dev/raw/raw*:
2018-10-17 09:41:40.848: [ SKGFD][2183792384]UFS discovery with :/dev/raw/raw*:
2018-10-17 09:41:40.848: [ SKGFD][2183792384]Execute glob on the string /dev/raw/raw*
2018-10-17 09:41:40.848: [ SKGFD][2183792384]running stat on disk:/dev/raw/rawctl
2018-10-17 09:41:40.848: [ SKGFD][2183792384]WARNING: Using brute force method to determine the size of /dev/raw/rawctl.
There will be performance issues. Please check configuration to determine the cause for the failure of ioctl
2018-10-17 09:41:40.848: [ SKGFD][2183792384]Fetching UFS disk :/dev/raw/rawctl:
2018-10-17 09:41:40.848: [ SKGFD][2183792384]OSS discovery with :/dev/raw/raw*:
2018-10-17 09:41:40.848: [ CSSD][2183792384]clssnmvDiskVerify: Successful discovery of 0 disks
2018-10-17 09:41:40.848: [ CSSD][2183792384]clssnmCompleteInitVFDiscovery: Completing initial voting file discovery
2018-10-17 09:41:40.848: [ CSSD][2183792384]clssnmvFindInitialConfigs: No voting files found
2018-10-17 09:41:40.848: [ CSSD][2183792384](:CSSNM00070:)clssnmCompleteInitVFDiscovery: Voting file not found. Retrying discovery in 15 seconds
2018-10-17 09:41:41.324: [ CSSD][2186336000]clssscSelect: cookie accept request 0x7f1c6c06ecf0
2018-10-17 09:41:41.324: [ CSSD][2186336000]clssscevtypSHRCON: getting client with cmproc 0x7f1c6c06ecf0
2018-10-17 09:41:41.324: [ CSSD][2186336000]clssgmRegisterClient: proc(3/0x7f1c6c06ecf0), client(344/0x7f1c6c068720)
2018-10-17 09:41:41.324: [ CSSD][2186336000]clssgmExecuteClientRequest(): type(6) size(684) only connect and exit messages are allowed before lease acquisition proc(0x7f1c6c)
2018-10-17 09:41:41.325: [ CSSD][2186336000]clssgmDiscEndpcl: gipcDestroy 0x3943
2018-10-17 09:41:42.326: [ CSSD][2186336000]clssscSelect: cookie accept request 0x7f1c6c06ecf0
2018-10-17 09:41:42.326: [ CSSD][2186336000]clssscevtypSHRCON: getting client with cmproc 0x7f1c6c06ecf0
2018-10-17 09:41:42.326: [ CSSD][2186336000]clssgmRegisterClient: proc(3/0x7f1c6c06ecf0), client(345/0x7f1c6c069370)
2018-10-17 09:41:42.326: [ CSSD][2186336000]clssgmExecuteClientRequest(): type(6) size(684) only connect and exit messages are allowed before lease acquisition proc(0x7f1c6c)
2018-10-17 09:41:42.326: [ CSSD][2186336000]clssgmDiscEndpcl: gipcDestroy 0x3959
css没有找到表决盘
$ ls -lrt /dev/raw/raw*
crw-rw----. 1 grid asmadmin 162, 0 Oct 17 09:35 /dev/raw/rawctl
惊了,raw绑定出来的设备都不在了
找到了rc.local中的绑定规则
rc.local:
/bin/raw /dev/raw/raw113 /dev/mapper/data113
/bin/raw /dev/raw/raw101 /dev/mapper/data101
/bin/raw /dev/raw/raw102 /dev/mapper/data102
/bin/raw /dev/raw/raw121 /dev/mapper/data121
/bin/raw /dev/raw/raw120 /dev/mapper/data120
/bin/raw /dev/raw/raw118 /dev/mapper/data118
/bin/raw /dev/raw/raw119 /dev/mapper/data119
...
/bin/raw /dev/raw/raw2 /dev/mapper/ocr102
/bin/raw /dev/raw/raw1 /dev/mapper/ocr101
/bin/raw /dev/raw/raw12 /dev/mapper/vote102
/bin/raw /dev/raw/raw115 /dev/mapper/data115
/bin/raw /dev/raw/raw11 /dev/mapper/vote101
chown -R grid:asmadmin /dev/raw/*
这里不知道是rc.local没有跑还是 /dev/mapper/* 多路径有问题
[grid@xsdbd31 etc]$ ls -lrt /dev/mapper/*
crw-------. 1 root root 10, 236 Oct 17 09:35 /dev/mapper/control
...
lrwxrwxrwx. 1 root root 8 Oct 17 09:35 /dev/mapper/data121 -> ../dm-11
lrwxrwxrwx. 1 root root 8 Oct 17 09:35 /dev/mapper/data122 -> ../dm-10
lrwxrwxrwx. 1 root root 7 Oct 17 09:35 /dev/mapper/data123 -> ../dm-9
lrwxrwxrwx. 1 root root 8 Oct 17 09:35 /dev/mapper/data124 -> ../dm-21
lrwxrwxrwx. 1 root root 8 Oct 17 09:35 /dev/mapper/data103 -> ../dm-15
lrwxrwxrwx. 1 root root 8 Oct 17 09:35 /dev/mapper/data105 -> ../dm-12
可以确定是rc.local没跑了
直接手动执行rc.local中的脚本(raw规则和chown更改权限都执行一下)
再观察css日志
# tail -f ocssd.log
2018-10-17 09:51:48.742: [ CSSD][4159170304]clssgmRPCDone: rpc 0x7f522daba818 (RPC#61) state 6, flags 0x100
2018-10-17 09:51:48.742: [ CSSD][4159170304]clssgmAddGrockMemCmpl: rpc 0x7f522daba818, ret 0, client 0x7f52180eb910 member 0x7f52182089f0
2018-10-17 09:51:48.742: [ CSSD][4159170304]clssgmAddGrockMemCmpl: sending type 6, size 540 to 0x7f52180eb910
2018-10-17 09:51:48.742: [ CSSD][4159170304]clssgmFreeRPCIndex: freeing rpc 61
2018-10-17 09:51:48.742: [ CSSD][4159170304]clssgmHandleGrockRcfgUpdate: grock(crs_version), updateseq(70788), status(0), sendresp(1)
2018-10-17 09:51:48.972: [ CSSD][4159170304]clssgmTestSetLastGrockUpdate: grock(crs_version), updateseq(70788) msgseq(70789), lastupdt<0x7f51e003fd50>, ignoreseq(0)
2018-10-17 09:51:48.972: [ CSSD][4159170304]clssgmUpdateGrpData: grock(crs_version), private data(84), incarn(15)
2018-10-17 09:51:48.973: [ CSSD][4159170304]clssgmHandleGrockRcfgUpdate: grock(crs_version), updateseq(70789), status(0), sendresp(1)
2018-10-17 09:51:49.797: [ CSSD][4156016384]clssnmSendingThread: sending status msg to all nodes
2018-10-17 09:51:49.797: [ CSSD][4156016384]clssnmSendingThread: sent 5 status msgs to all nodes
查看集群启动状态
# /grid/app/11.2.0/grid/bin/crsctl stat res -t -init
--------------------------------------------------------------------------------
NAME TARGET STATE SERVER STATE_DETAILS
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.asm
1 ONLINE ONLINE xsdbd31 Started
ora.cluster_interconnect.haip
1 ONLINE ONLINE xsdbd31
ora.crf
1 ONLINE ONLINE xsdbd31
ora.crsd
1 ONLINE INTERMEDIATE xsdbd31
ora.cssd
1 ONLINE ONLINE xsdbd31
ora.cssdmonitor
1 ONLINE ONLINE xsdbd31
ora.ctssd
1 ONLINE ONLINE xsdbd31 OBSERVER
ora.diskmon
1 OFFLINE OFFLINE
ora.evmd
1 ONLINE INTERMEDIATE xsdbd31
ora.gipcd
1 ONLINE ONLINE xsdbd31
ora.gpnpd
1 ONLINE ONLINE xsdbd31
ora.mdnsd
1 ONLINE ONLINE xsdbd31
css起来了,正在启动crs
再等几分钟
# /grid/app/11.2.0/grid/bin/crsctl stat res -t -init
--------------------------------------------------------------------------------
NAME TARGET STATE SERVER STATE_DETAILS
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.asm
1 ONLINE ONLINE xsdbd31 Started
ora.cluster_interconnect.haip
1 ONLINE ONLINE xsdbd31
ora.crf
1 ONLINE ONLINE xsdbd31
ora.crsd
1 ONLINE ONLINE xsdbd31
ora.cssd
1 ONLINE ONLINE xsdbd31
ora.cssdmonitor
1 ONLINE ONLINE xsdbd31
ora.ctssd
1 ONLINE ONLINE xsdbd31 OBSERVER
ora.diskmon
1 OFFLINE OFFLINE
ora.evmd
1 ONLINE ONLINE xsdbd31
ora.gipcd
1 ONLINE ONLINE xsdbd31
ora.gpnpd
1 ONLINE ONLINE xsdbd31
ora.mdnsd
1 ONLINE ONLINE xsdbd31
集群启动完成
登陆数据库查看状态ok