As you may have guessed, applying patches on the Oracle Database Appliance can be a little bit different from your standard Oracle environment. Oracle releases a software version that covers all aspects of the ODA - firmware, operating system, and Oracle software stack (grid infrastructure, rdbms). Versions are numbered like this (image courtesy MOS note #1397680.1:
The ODA was initially released with version 184.108.40.206.0, and has seen several releases over the last year:
|220.127.116.11.0||CPU bugfix, 18.104.22.168.5 GI PSU5|
|22.214.171.124.1||OAK software updates|
|126.96.36.199.0||188.8.131.52 GI/RDBMS, OEL 5.8, UEK kernel|
|184.108.40.206.0||July 2012 PSU for 220.127.116.11/18.104.22.168, firmware upgrades, multiple database home support|
In this post, we'll discuss upgrading an ODA running RAC or RAC one node to version 22.214.171.124.0. Note that before going to 2.3, users must upgrade to 2.2 first. This is because the 2.3 patch upgrade does not include some of the files used for the OEL 5.8 upgrade, among other things.
A recent discussion thread on the OTN Exadata forum made me want to test a feature of 11.2 RAC - voting disk redundancy. There was one section of the Clusterware Adminsitration and Deployment Guide (http://goo.gl/eMrQM) that made me want to test it out to see how well it worked:If voting disks are stored on Oracle ASM with normal or high redundancy, and the storage hardware in one failure group suffers a failure, then if there is another disk available in a disk group in an unaffected failure group, Oracle ASM recovers the voting disk in the unaffected failure group.
How resilient is it? How quickly will the cluster recover the voting disk? Do we have to wait for the ASM rebalance timer to tick down to zero before the voting disk gets recreated?
First, what is a voting disk? The voting disks are used to determine which nodes are members of the cluster. In a RAC configuration, there are either 1, 3, or 5 voting disks for the cluster.
Next, when voting disks are placed into an ASM diskgroup, which is the default in 11.2, the number of voting disks that are created depend on the redundancy level of the diskgroup. For external redundancy, 1 voting disk is created. When running ASM redundancy, voting disks have a higher requirement for the number of disks. Normal redundancy creates 3 voting disks, and high redundancy creates 5. Oracle recommends at least 300MB per voting disk file, and 300MB for each copy of the OCR. On standard (non-Exadata) RAC builds, I prefer to place OCR and Voting disks into their own diskgroup, named GRID or SYSTEM.
Anyway, back to the fun stuff. I'm testing this on Exadata, but the results should carry over to any other system running 11.2 RAC. For starters, we have a diskgroup named DBFS_DG that is used to hold the OCR and voting disks. We can see that by running the lsdg command in asmcmd. I've abbreviated the output for clarity.
[enkdb03:oracle:+ASM1] /u01/app/126.96.36.199/grid > asmcmd ASMCMD> lsdg State Type Rebal Total_MB Free_MB Usable_file_MB Offline_disks Voting_files Name MOUNTED NORMAL N 65261568 49688724 21877927 0 N DATA/ MOUNTED NORMAL N 1192320 978292 434950 0 Y DBFS_DG/ MOUNTED HIGH N 22935248 20570800 5466918 0 N RECO/ ASMCMD>
The DBFS_DG diskgroup has 4 failgroups - ENKCEL04, ENKCEL05, ENKCEL06, and ENKCEL07:
SYS:+ASM1> select distinct g.name "Diskgroup", 2 d.failgroup "Failgroup" 3 from v$asm_diskgroup g, 4 v$asm_disk d 5 where g.group_number = d.group_number 6 and g.NAME = 'DBFS_DG' 7 / Diskgroup Failgroup ------------------------------ ------------------------------ DBFS_DG ENKCEL04 DBFS_DG ENKCEL05 DBFS_DG ENKCEL06 DBFS_DG ENKCEL07 SYS:+ASM1>
Finally, our current voting disks reside in failgroups ENKCEL05, ENKCEL06, and ENKCEL07:
[enkdb03:oracle:+ASM1] /home/oracle > sudo /u01/app/188.8.131.52/grid/bin/crsctl query css votedisk ## STATE File Universal Id File Name Disk group -- ----- ----------------- --------- --------- 1. ONLINE 43164b9cc7234fe1bff4eb968ec4a1dc (o/192.168.12.10/DBFS_DG_CD_02_enkcel06) [DBFS_DG] 2. ONLINE 2e6db5ba5fd34fc8bfaa0ab8b9d0ddf5 (o/192.168.12.11/DBFS_DG_CD_03_enkcel07) [DBFS_DG] 3. ONLINE 95eb38e5dfc34f3ebfd72127e7fe9c12 (o/192.168.12.9/DBFS_DG_CD_02_enkcel05) [DBFS_DG] Located 3 voting disk(s).
Going back to the original questions, what does it take for CRS to notice that a voting disk is gone, and how quickly will it be replaced? Are they handled in the same way as normal disks, or some other way? When an ASM disk goes offline, ASM waits the amount of time listed in the disk_repair_time attribute for the diskgroup (default is 3.6 hours) before dropping the disk and performing a rebalance. Let's offline one of the failgroups that has a voting disk and find out.
SYS:+ASM1> !date Fri Jan 6 12:57:18 CST 2012 SYS:+ASM1> alter diskgroup dbfs_dg offline disks in failgroup ENKCEL05; Diskgroup altered. SYS:+ASM1> !date Fri Jan 6 12:57:46 CST 2012 SYS:+ASM1> !sudo /u01/app/184.108.40.206/grid/bin/crsctl query css votedisk [sudo] password for oracle: ## STATE File Universal Id File Name Disk group -- ----- ----------------- --------- --------- 1. ONLINE 43164b9cc7234fe1bff4eb968ec4a1dc (o/192.168.12.10/DBFS_DG_CD_02_enkcel06) [DBFS_DG] 2. ONLINE 2e6db5ba5fd34fc8bfaa0ab8b9d0ddf5 (o/192.168.12.11/DBFS_DG_CD_03_enkcel07) [DBFS_DG] 3. ONLINE ab06b75d54764f79bfd4ba032b317460 (o/192.168.12.8/DBFS_DG_CD_02_enkcel04) [DBFS_DG] Located 3 voting disk(s). SYS:+ASM1> !date Fri Jan 6 12:58:03 CST 2012
That was quick! So from this exercise, we can see that ASM doesn't waste any time to recreate the voting disk. With files that are this critical to the stability of the cluster, it's not hard to see why. If we dig further, we can see that ocssd noticed that the voting disk on o/192.168.12.9/DBFS_DG_CD_02_enkcel05 was offline, and it proceeded to create a new file on disk o/192.168.12.8/DBFS_DG_CD_02_enkcel04:
2012-01-06 12:57:37.075 [cssd(10880)]CRS-1605:CSSD voting file is online: o/192.168.12.8/DBFS_DG_CD_02_enkcel04; details in /u01/app/220.127.116.11/grid/log/enkdb03/cssd/ocssd.log. 2012-01-06 12:57:37.075 [cssd(10880)]CRS-1604:CSSD voting file is offline: o/192.168.12.9/DBFS_DG_CD_02_enkcel05; details at (:CSSNM00069:) in /u01/app/18.104.22.168/grid/log/enkdb03/cssd/ocssd.log.
Finally, what happens when the failgroup comes back online? There are no changes made to the voting disks. The voting disk from the failgroup that was previously offline is still there, but will be overwritten if that space is needed:
SYS:+ASM1> !date Fri Jan 6 13:05:09 CST 2012 SYS:+ASM1> alter diskgroup dbfs_dg online disks in failgroup ENKCEL05; Diskgroup altered. SYS:+ASM1> !date Fri Jan 6 13:06:31 CST 2012 SYS:+ASM1> !sudo /u01/app/22.214.171.124/grid/bin/crsctl query css votedisk [sudo] password for oracle: ## STATE File Universal Id File Name Disk group -- ----- ----------------- --------- --------- 1. ONLINE 43164b9cc7234fe1bff4eb968ec4a1dc (o/192.168.12.10/DBFS_DG_CD_02_enkcel06) [DBFS_DG] 2. ONLINE 2e6db5ba5fd34fc8bfaa0ab8b9d0ddf5 (o/192.168.12.11/DBFS_DG_CD_03_enkcel07) [DBFS_DG] 3. ONLINE ab06b75d54764f79bfd4ba032b317460 (o/192.168.12.8/DBFS_DG_CD_02_enkcel04) [DBFS_DG] Located 3 voting disk(s).
In previous versions, this automation did not exist. If a voting disk went offline, the DBA had to manually create a new voting disk. Keep in mind that this feature will only take effect if there must be another failgroup available to place the voting disk in. If you only have 3 failgroups for your normal redundancy (or 5 for your high redundancy) OCR/voting diskgroup, this automatic recreation will not occur.
In part 1 of this series, we took a look inside the ODA to see what the OS was doing. Here, we'll dig in a little further to the disk and storage architecture with regard to the hardware and ASM.
There have been a lot of questions about the storage layout of the shared disks. We'll start at the lowest level and make our way to the disks as we move down the ladder. First, there are 2 dual-ported LSI SAS controllers in each of the system controllers (SCs). They are each connected to a SAS expander that is located on the system board. Each of these SAS expanders connect to 12 of the hard disks on the front of the ODA. The disks are dual-ported SAS, so that each disk is connected to an expander on each of the SCs. Below is a diagram of the SAS connectivity on the ODA (Note: all diagrams are collected from public ODA documentation, as well as various ODA-related support notes available on My Oracle Support).
From this, you can see the relationship between the SAS controllers, SAS expanders, and SAS drives on the front end. If you look at the columns of disks, the first 2 columns are serviced by one expander, while the third and fourth columns are services by the other expander. What the diagram refers to as "Controller-0" and "Controller-1" are actually the independent SCs in the X4370M2. What this shows is that you can lose any of the following components in the diagram and your database will continue to run (assuming RAC is in use):
Having done a handful of Exadata implementations, there's always been one piece of the configuration that's bothered me more than anything else. In the process of ordering an Exadata, Oracle sends the customer a "Configuration Worksheet" that asks questions about how the system should be configured. It's standard stuff: hostnames, DNS and NTP servers, UID and GID for the oracle/dba/oinstall (that's another sore spot) accounts, and IP addresses for the various interfaces. The worksheet comes as a nifty PDF that the customer can modify to suit the needs of the Exadata system.
Unfortunately, the PDF does not allow the customer to modify the IP range used for the IB network. The only option from this form is to use the network 192.168.8.0/22 with the hosts using 192.168.10.1 - 192.168.10.22 (for a full rack). Why the /22 you might ask? Oracle recommends using a subnet of 255.255.252.0 so that multiple Exadata systems can be connected, along with an Exalogic, and whatever other products they have down the line that will connect with Exadata on the IB network. It would be nice if Oracle would allow customers to define this network range themselves, instead of sticking everybody in the 192.168.8.0/22 network. Some say that it won't be a problem, because the interconnect is non-routable, but I disagree. Find out why after the jump