Voting Disk Redundancy in ASM

By | January 6, 2012

A recent discussion thread on the OTN Exadata forum made me want to test a feature of 11.2 RAC - voting disk redundancy.  There was one section of the Clusterware Adminsitration and Deployment Guide (http://goo.gl/eMrQM) that made me want to test it out to see how well it worked:

If voting disks are stored on Oracle ASM with normal or high redundancy, and the storage hardware in one failure group suffers a failure, then if there is another disk available in a disk group in an unaffected failure group, Oracle ASM recovers the voting disk in the unaffected failure group.

 
How resilient is it?  How quickly will the cluster recover the voting disk?  Do we have to wait for the ASM rebalance timer to tick down to zero before the voting disk gets recreated?

First, what is a voting disk?  The voting disks are used to determine which nodes are members of the cluster.  In a RAC configuration, there are either 1, 3, or 5 voting disks for the cluster.

Next, when voting disks are placed into an ASM diskgroup, which is the default in 11.2, the number of voting disks that are created depend on the redundancy level of the diskgroup.  For external redundancy, 1 voting disk is created.  When running ASM redundancy, voting disks have a higher requirement for the number of disks.  Normal redundancy creates 3 voting disks, and high redundancy creates 5.  Oracle recommends at least 300MB per voting disk file, and 300MB for each copy of the OCR.  On standard (non-Exadata) RAC builds, I prefer to place OCR and Voting disks into their own diskgroup, named GRID or SYSTEM.

Anyway, back to the fun stuff.  I'm testing this on Exadata, but the results should carry over to any other system running 11.2 RAC.  For starters, we have a diskgroup named DBFS_DG that is used to hold the OCR and voting disks.  We can see that by running the lsdg command in asmcmd. I've abbreviated the output for clarity.

[enkdb03:oracle:+ASM1] /u01/app/11.2.0.2/grid
> asmcmd
ASMCMD> lsdg
State    Type    Rebal  Total_MB   Free_MB  Usable_file_MB  Offline_disks  Voting_files  Name
MOUNTED  NORMAL  N      65261568  49688724        21877927              0             N  DATA/
MOUNTED  NORMAL  N       1192320    978292          434950              0             Y  DBFS_DG/
MOUNTED  HIGH    N      22935248  20570800         5466918              0             N  RECO/
ASMCMD>

The DBFS_DG diskgroup has 4 failgroups - ENKCEL04, ENKCEL05, ENKCEL06, and ENKCEL07:

SYS:+ASM1> select distinct g.name "Diskgroup",
  2    d.failgroup "Failgroup"
  3  from v$asm_diskgroup g,
  4    v$asm_disk d
  5  where g.group_number = d.group_number
  6  and g.NAME = 'DBFS_DG'
  7  /

Diskgroup		       Failgroup
------------------------------ ------------------------------
DBFS_DG 		       ENKCEL04
DBFS_DG 		       ENKCEL05
DBFS_DG 		       ENKCEL06
DBFS_DG 		       ENKCEL07

SYS:+ASM1>

Finally, our current voting disks reside in failgroups ENKCEL05, ENKCEL06, and ENKCEL07:

[enkdb03:oracle:+ASM1] /home/oracle
> sudo /u01/app/11.2.0.2/grid/bin/crsctl query css votedisk
##  STATE    File Universal Id                File Name Disk group
--  -----    -----------------                --------- ---------
 1. ONLINE   43164b9cc7234fe1bff4eb968ec4a1dc (o/192.168.12.10/DBFS_DG_CD_02_enkcel06) [DBFS_DG]
 2. ONLINE   2e6db5ba5fd34fc8bfaa0ab8b9d0ddf5 (o/192.168.12.11/DBFS_DG_CD_03_enkcel07) [DBFS_DG]
 3. ONLINE   95eb38e5dfc34f3ebfd72127e7fe9c12 (o/192.168.12.9/DBFS_DG_CD_02_enkcel05) [DBFS_DG]
Located 3 voting disk(s).

Going back to the original questions, what does it take for CRS to notice that a voting disk is gone, and how quickly will it be replaced? Are they handled in the same way as normal disks, or some other way? When an ASM disk goes offline, ASM waits the amount of time listed in the disk_repair_time attribute for the diskgroup (default is 3.6 hours) before dropping the disk and performing a rebalance. Let's offline one of the failgroups that has a voting disk and find out.

SYS:+ASM1> !date
Fri Jan  6 12:57:18 CST 2012

SYS:+ASM1> alter diskgroup dbfs_dg offline disks in failgroup ENKCEL05;

Diskgroup altered.

SYS:+ASM1> !date
Fri Jan  6 12:57:46 CST 2012

SYS:+ASM1> !sudo /u01/app/11.2.0.2/grid/bin/crsctl query css votedisk
[sudo] password for oracle:
##  STATE    File Universal Id                File Name Disk group
--  -----    -----------------                --------- ---------
 1. ONLINE   43164b9cc7234fe1bff4eb968ec4a1dc (o/192.168.12.10/DBFS_DG_CD_02_enkcel06) [DBFS_DG]
 2. ONLINE   2e6db5ba5fd34fc8bfaa0ab8b9d0ddf5 (o/192.168.12.11/DBFS_DG_CD_03_enkcel07) [DBFS_DG]
 3. ONLINE   ab06b75d54764f79bfd4ba032b317460 (o/192.168.12.8/DBFS_DG_CD_02_enkcel04) [DBFS_DG]
Located 3 voting disk(s).

SYS:+ASM1> !date
Fri Jan  6 12:58:03 CST 2012

That was quick! So from this exercise, we can see that ASM doesn't waste any time to recreate the voting disk. With files that are this critical to the stability of the cluster, it's not hard to see why. If we dig further, we can see that ocssd noticed that the voting disk on o/192.168.12.9/DBFS_DG_CD_02_enkcel05 was offline, and it proceeded to create a new file on disk o/192.168.12.8/DBFS_DG_CD_02_enkcel04:

2012-01-06 12:57:37.075
[cssd(10880)]CRS-1605:CSSD voting file is online: o/192.168.12.8/DBFS_DG_CD_02_enkcel04; details in /u01/app/11.2.0.2/grid/log/enkdb03/cssd/ocssd.log.
2012-01-06 12:57:37.075
[cssd(10880)]CRS-1604:CSSD voting file is offline: o/192.168.12.9/DBFS_DG_CD_02_enkcel05; details at (:CSSNM00069:) in /u01/app/11.2.0.2/grid/log/enkdb03/cssd/ocssd.log.

Finally, what happens when the failgroup comes back online? There are no changes made to the voting disks. The voting disk from the failgroup that was previously offline is still there, but will be overwritten if that space is needed:

SYS:+ASM1> !date
Fri Jan  6 13:05:09 CST 2012

SYS:+ASM1> alter diskgroup dbfs_dg online disks in failgroup ENKCEL05;

Diskgroup altered.

SYS:+ASM1> !date
Fri Jan  6 13:06:31 CST 2012

SYS:+ASM1> !sudo /u01/app/11.2.0.2/grid/bin/crsctl query css votedisk
[sudo] password for oracle:
##  STATE    File Universal Id                File Name Disk group
--  -----    -----------------                --------- ---------
 1. ONLINE   43164b9cc7234fe1bff4eb968ec4a1dc (o/192.168.12.10/DBFS_DG_CD_02_enkcel06) [DBFS_DG]
 2. ONLINE   2e6db5ba5fd34fc8bfaa0ab8b9d0ddf5 (o/192.168.12.11/DBFS_DG_CD_03_enkcel07) [DBFS_DG]
 3. ONLINE   ab06b75d54764f79bfd4ba032b317460 (o/192.168.12.8/DBFS_DG_CD_02_enkcel04) [DBFS_DG]
Located 3 voting disk(s).

In previous versions, this automation did not exist. If a voting disk went offline, the DBA had to manually create a new voting disk. Keep in mind that this feature will only take effect if there must be another failgroup available to place the voting disk in. If you only have 3 failgroups for your normal redundancy (or 5 for your high redundancy) OCR/voting diskgroup, this automatic recreation will not occur.

8 thoughts on “Voting Disk Redundancy in ASM

  1. Fairlie Rego

    Gr8 post.. As an aside after changing the disk_repair_time I can’t seem to locate the new value in any of the ASM views. Hopefully I am missing something simple.

    Reply
  2. Fairlie Rego

    Sorry got it from the md_backup command

    @diskgroup_set = (
    {
    ‘ATTRINFO’ => {
    ‘_._DIRVERSION’ => ‘11.2.0.2.0’,
    ‘DISK_REPAIR_TIME’ => ‘0.5H’,

    Reply
  3. Andy Colvin Post author

    disk_repair_time should be visible in v$asm_attribute. I have a script for that in my scripts section (asm_attributes.sql).

    Reply
  4. Santhosh

    Hi ,

    Great post. Have one clarification though … I am in process of designing a RAC with two separate storage devices.

    we are trying to eliminate single storage device failure leading to RAC or database failure.

    If we have five failure groups in the disk group(normal redundancy) holding voting disk and two of the disk holding voting disk goes offline due to storage failure.

    Will ASM recreate the voting disks or will this scenario lead to cluster / node re-boots.

    Thanks you.

    Reply
  5. Andy Colvin Post author

    Santhosh,

    I haven’t attempted this, but I believe that if you lost 2 failgroups simultaneously, your diskgroup would go offline. The reason that I expect that is because you have the option to store other files in the diskgroup, so ASM would want to protect the files that could possibly be stored there. I’ll test this and let you know what I find.

    Reply
  6. jee

    Very good explanation. In context to exadata, if no of db nodes are 2 & cell nodes are more than 3….What will be the scenariot of DB nodes are 2 & cell nodes are only 3….and CD with high redudancy…I mean VD will be allocated & survive on disk failure.

    Reply
  7. jhavimalkishore

    You simulated the unavailability disk by offline of disk at asm level, but that is considered as graceful operation and in that case the voting file will move to other disk.
    Alert log shows:
    Sun Jun 22 00:14:42 2014
    SQL> alter diskgroup dbfs_dg offline disks in failgroup ENKCEL05;

    But if there will be actual I/O issue then the offline of disk at asm level will not be graceful instead it will be “force” and will be done by asm server, somthing like below

    SQL> alter diskgroup dbfs_dg offline disks in failgroup ENKCEL05 FORCE /* ASM SERVER */

    So in that case the voting file will not move to other available disk / failgroup.

    Please check below link: http://jhavimalkishore.wordpress.com/normal-redundancy-diskgroup-shift-the-voting-file-to-another-available-disk-if-existing-voting-file-disk-dropped-or-becoming-unavailable/

    Reply

Leave a Reply

Your email address will not be published.