Replacing disk controlled by SVM

The following scenario assumes two mirrored disks, with two state database replicas located on slice 7 of both disks. High level steps for this are as follows:

  1. determine failed disk
  2. detach failed submirrors
  3. clear failed submirror metadevices and database replicas from failed disk
  4. unconfigure the failed disk and replace it
  5. configure the new disk and recreate VTOC
  6. add new database replicas
  7. recreate the submirrors and reattach them to the respective mirrors

This is the current /etc/vfstab:

bash-3.00# cat /etc/vfstab
#device      device        mount    FS      fsck    mount   mount
#to mount    to fsck               point    type    pass    at boot options
fd              -                /dev/fd         fd      -   no  -
/proc           -                /proc           proc    -   no  -
/dev/md/dsk/d0  -                -               swap    -   no  -
/dev/md/dsk/d10 /dev/md/rdsk/d10 /               ufs     1   no  logging
/dev/md/dsk/d30 /dev/md/rdsk/d30 /export/home    ufs     2   yes logging
/devices        -                /devices        devfs   -   no  -
ctfs    -       /system/contract ctfs    -       no      -
objfs   -       /system/object   objfs   -       no      -
swap    -       /tmp             tmpfs   -       yes     -

From here on I will use d0 and its submirrors as an example. d0 consists of d1 and d2. d2 is on the failed disk.

d0: Mirror
    Submirror 0: d1
      State: Okay  
    Submirror 1: d2
      State: Needs maintenance  
    Pass: 1
    Read option: roundrobin (default)
    Write option: parallel (default)
    Size: 45753093 blocks (21 GB)

d1: Submirror of d0
    State: Okay          
    Size: 45753093 blocks (21 GB)
    Stripe 0:
        Device     Start Block  Dbase        State Reloc Hot Spare
        c1t0d0s0          0     No            Okay   Yes

d2: Submirror of d0
    State: Unavailable
    Size: 45753093 blocks (21 GB)
    Stripe 0:
        Device     Start Block  Dbase        State Reloc Hot Spare
        c1t1d0s0          0     No               -   Yes

First we detach d2. The same has to be repeated for d32 and d12:

bash-3.00# metadetach -f d0 d2
d0: submirror d2 is detached

We need to clear d2. Again, the same is repeated for d32 and d12:

bash-3.00# metaclear d2
d2: Concat/Stripe is cleared

Now we delete database replicas from the failed disk. It’s also very important to make sure we have at least half of state database replicas available before we start removing them from the failed disk. Here is Sun document that explains Majority Consensus Algorithm Solaris Volume Manager uses. You can determine number and location of the replicas using metadb -i command.

bash-3.00# metadb -d c1t1d0s7

Now we can unconfigure the failed disk using _cfgadm_, replace it and configure the new disk:

bash-3.00# cfgadm -al
Ap_Id                          Type         Receptacle   Occupant     Condition
c0                             scsi-bus     connected    configured   unknown
c0::dsk/c0t0d0                 disk         connected    configured   unknown
c0::dsk/c0t1d0                 disk         connected    configured   unknown
c1                             scsi-bus     connected    unconfigured unknown
usb0/1                         unknown      empty        unconfigured ok
usb0/2                         unknown      empty        unconfigured ok
usb0/3                         unknown      empty        unconfigured ok
usb0/4                         unknown      empty        unconfigured ok
usb1/1                         unknown      empty        unconfigured ok
usb1/2                         unknown      empty        unconfigured ok
usb1/3                         unknown      empty        unconfigured ok
usb1/4                         unknown      empty        unconfigured ok
bash-3.00# cfgadm -c unconfigure c1::dsk/c1t1d
bash-3.00# cfgadm -c configure c1::dsk/c1t1d0

Now we replicate VTOC from the good disk:

bash-3.00# prtvtoc /dev/rdsk/c1t0d0s2 | fmthard -s - /dev/rdsk/c1t1d0s2

Add database replicas to the new disk:

bash-3.00# metadb -a -c2 c1t1d0s7

Finally, we can recreate failed submirrors and attach them to their respective mirrors and let them sync up. Again, the same is applies for d32 and d12:

bash-3.00# metainit d2 1 1 c1t1d0s0
d2: Concat/Stripe is setup
bash-3.00# metattach d0 d2
d0: submirror d2 is attached

Few notes: This setup contains total of 4 state database replicas. During a disk failure half of the replicas will be gone. If the server gets rebooted for whatever reason, it will not come up in multiuser mode. If you have less than half of the replicas, the system will panic. For more info on all that check out Solaris Volume Manager Administration Guide.

When using cfgadm to unconfigure disk, there can be no resources using that disk. Otherwise, unconfigure will fail. Quite possibly swap metadevice is set to be dedicated dump device. To view or change dedicated dump device settings use dumpadm command.