SC4020/SCv2020 控制器更换流程步骤和可能遇到的问题

DELL SC4020 或者DELL SCv2020/2000控制器更换过程中,很多客户会遇到问题,下面简单介绍下可能遇到的问题和一些处理办法:

先说明一下,DELL SC的不同generation,就是不同代的更换方式是不同的,这里仅仅是针对SC4020/Scv2000的,老的或者新的SC3000/5000/7000/8000/9000 等是有所不同的。

首先我们来看看官方service manual guide中的更换流程,很多工程师就是拿着这个文档来做更换的,然后不成功,还不知道怎么回事,反复就说控制器有问题。

第一步先要找到故障的控制器,这个比较容易,但也有人搞错,DELL SC把上下两个控制器分别称之为 TOP controller(顶部控制器)和bottom controller(底部控制器)。将故障控制器抽出来的时候要注意呀,已经遇到好几次,由于上下两个控制器的把手是一起的,结果抽错位置了,然后就悲剧了,好的控制器也宕机了。


  1. Make sure all the cables are labeled. 
  2. Disconnect all the cables from the storage controller that was shut down.


      3. Remove the battery from the storage controller.

把电池先拔掉,这个需要注意些。如果控制器彻底挂了就无所谓。但是如果控制器还没有彻底挂。如果直接把控制器,很可能会导致dirty cache,就是内存中的数据没有刷入硬盘,有数据丢失的风险。

       4. Push down on the release tab of the storage controller and pull the release lever away from the  chassis.

NOTE: Wait until all the storage controller indicators are off before removing the storage controller.


        5. Grasp the release lever and pull the storage controller away from the chassis.

        6. Locate the battery removed in a previous step and insert it into the replacement storage controller.

a. Align the battery with the slot on the storage controller.

b. Slide the battery into the storage controller until the release tab clicks into place.


         7. Insert the replacement storage controller into the chassis until it is fully seated.

NOTE: The bottom storage controller is installed upside down.




     8. Reconnect the cables to the storage controller.

     9. Push the release lever toward the chassis until it clicks into place. The storage controller is powered on.


NOTE: When a storage controller is powered on, there is a one‐minute delay while the storage controller prepares to boot. During this time, the only indication that the storage controller is powered on are the LEDs on the storage controller. After the one‐minute delay, the fans and LEDs turn on as an indication that the storage controller is starting up.


     10. In the Storage Center System Manager, make sure that the replacement storage controller is recognized and shown as up and running.


NOTE: If the Storage Center software on the replacement storage controller is older than the software on the existing storage controller, the storage system updates the replacement storage controller with the software version on the existing storage controller. The Storage Center software update on the replacement storage controller could take 15 to 45 minutes to complete.


NOTE: In rare cases, when a storage controller is replaced, it may boot into safe mode and wait to be configured. If so, contact Dell Technical Support Services for the configuration information to enter. In addition, if the storage system is at a later Storage Center OS version then the replacement storage controller, the Storage Center OS on the replacement storage controller must be manually updated using virtual media update method.

还有可能,就是更换完毕的控制器启动到了safe mode,然后就不动了。这个时候一定是要人工干预了。还有一种情况就是更换控制器的OS版本高于原来的版本,系统不会自动同步为原来的控制器版本,需要人工使用ISO文件来升级了。


事实上,实际情况完全不是这样的,这个流程是针对DELL原厂全新的控制器备件来说的,不是针对第三方的拆机备件的。您要做的要么 1)选择靠谱的备件供应商,可以加vx: StorageExpert。 或者 2)一定要有串口线,监控更换过程,知道哪里除了问题,这样才好对症下药。

所以,绝大部分的更换遇到的问题就是控制器会进入到safe mode,如下图所示:

Safemode 是系统启动到了一个最小模式,不是完全启动系统。这这个状态可以做故障诊断和分析处理。如果不知道怎么操作,可以添加vx:StorageExpert来协助处理。


Failed controllers , first selection, restart

这个状态就是典型的控制器脑裂,这个控制器必须找到leader 控制器才可以正常启动,否则就一直找。如果要快速恢复业务,就需要放弃掉另外一个控制器,让单控启动。



第四种情况就是控制器启动以后直接到了无任何配置的模式,就是safe mode sn为0的情况,如下图所示:

最后一种情况就是系统完全不启动,找不到boot disk,这种情况大概率就是boot disk的ssd盘坏了,处理办法就是更换这个卡或者做ISO的reimage。


