Oracle has released a critical patch for storage server versions 11.2.2.3.x through 11.2.2.4.1. While 11.2.2.4.1 was released last week, there were a few oneoff patches from 11.2.2.4.0 that didn't seem to make it in to the release. Oracle has since released 11.2.2.4.2 (patch #13513611, supplemental note #1388400.1). Similar to 11.2.2.4.1, this release looks to patch several outstanding issues. Here's the list of bugs fixed from the readme for 11.2.2.4.2:
12764521 INFINIBAND DIAG COMMANDS (LIKE IBDIAGNET AND IBNETDISCOVER) ARE NOT WORKING 13083530 10 GB-E BONDED INTERFACES FAILING- EXADATA 13410353 AFTER UPGRADE TO 11.2.2.4 INFINIBAND CMDS IBDIAGNET, IBNETDISCOVER NOT WORKING 13489032 CHECKHWNFWPROFILE DOES NOT DETECT FAILED FLASH FDOM 13489445 ORA-600 [OSSMISC:OSSMISC_TIMER] WHEN NTPD DETECTED 6 MILLISECOND TIME DIFFERENCE 13512932 FIX INSTALLED WORKAROUND FOR NTP UPDATE BUG 13489445
As you can see, the previously mentioned bugs have been fixed. There's another bug that was fixed in 11.2.2.4.1 that could be an issue for anybody running 11.2.2.3.x through 11.2.2.4.0. This bug (13454147) can remove the flashcache from a cell that has an uptime of 6 months or greater. Fortunately, Oracle has released a patch that includes these critical issues in the event that you can't quickly upgrade to 11.2.2.4.2 - I wouldn't advise running this version for at least a couple weeks...I always advise clients to wait that long for the early adopters to weed out any major issues.
Applying the critical patch only takes a minute, and doesn't take the storage servers or database instances offline. After it's done, a restart of cellsrv needs to be scheduled, but that can be done in a rolling fashion. Read on for an example of applying this patch. As always, do not apply any patch to a production system before appropriately testing against a non-production system!
According to the documentation for patch 13517481, the following bugs are fixed:
Bug Description -------- ------------------------------------------------------------------- 13454147 Flash cards go offline after 6 months of uptime 12886507 IDT switch in PCI riser resets causing missing flash cards 12626126 Temporary IO stall caused by drive medium errors causes cell reboot 13489445 CELLSRV crash if NTPD interrupted and time drifts back too far 13083530 10GbE network interfaces shutdown
The installation is very quick and easy. Unpack the patch to a directory on the first database server (I used /u01/stage/patches/11.2.2.4.1_supplemental), and cd to the directory.
[enkdb01:root] /root
> cd /u01/stage/patches/11.2.2.4.1_supplemental/
[enkdb01:root] /u01/stage/patches/11.2.2.4.1_supplemental
> ls
p13517481_112100_Linux-x86-64.zip
[enkdb01:root] /u01/stage/patches/11.2.2.4.1_supplemental
> unzip p13517481_112100_Linux-x86-64.zip
Archive: p13517481_112100_Linux-x86-64.zip
creating: 13517481/
inflating: 13517481/fixpciidt_12886507
inflating: 13517481/10gig_rxusecs0
inflating: 13517481/README.txt
inflating: 13517481/install.sh
inflating: 13517481/fix_flash_links.sh
[enkdb01:root] /u01/stage/patches/11.2.2.4.1_supplemental
> cd 13517481/
[enkdb01:root] /u01/stage/patches/11.2.2.4.1_supplemental/13517481
> ls
10gig_rxusecs0 fix_flash_links.sh fixpciidt_12886507 install.sh README.txt
After this has been done, copy the all_group file from root's home directory, and verify that SSH equivalence works ok. If everything passes, run the patch check:
[enkdb01:root] /u01/stage/patches/11.2.2.4.1_supplemental/13517481
> cp ~/all_group .
[enkdb01:root] /u01/stage/patches/11.2.2.4.1_supplemental/13517481
> dcli -l root -g all_group hostname
enkdb01: enkdb01.enkitec.com
enkdb02: enkdb02.enkitec.com
enkcel01: enkcel01.enkitec.com
enkcel02: enkcel02.enkitec.com
enkcel03: enkcel03.enkitec.com
[enkdb01:root] /u01/stage/patches/11.2.2.4.1_supplemental/13517481
> ./install.sh -g all_group check
Additional details in patch13517481.log
Perform check using dcli on all systems in all_group: enkdb01 enkdb02 enkcel01 enkcel02 enkcel03
Completed check on all systems.
Screen output captured from all systems and placed in patch13517481.log
Log files from all systems collected and placed in the current directory /u01/stage/patches/11.2.2.4.1_supplemental/13517481
Check the logs that were created (patch13517481.log for the summary, there is a log for each server), and if everything passes, then apply the patch. Note that the patch will only apply the required fixes based on the server type. On our V2 Exadata shown here, it does not apply the 10GbE patch...The V2 systems do not have 10GbE capability.
[enkdb01:root] /u01/stage/patches/11.2.2.4.1_supplemental/13517481
> ./install.sh -g all_group apply
Additional details in patch13517481.log
Perform apply using dcli on all systems in all_group: enkdb01 enkdb02 enkcel01 enkcel02 enkcel03
Completed apply on all systems.
Screen output captured from all systems and placed in patch13517481.log
Log files from all systems collected and placed in the current directory /u01/stage/patches/11.2.2.4.1_supplemental/13517481
The log is appended, and you can check to make sure that the patches were applied successfully. Here are the relevant contents of our patch13517481.log file:
2011-12-29 07:13:22 CST main: ====================================================================
2011-12-29 07:20:49 CST main: Running ./install.sh with options ACTION=apply, BUGFIX=ALL, GROUPFILE=all_group
2011-12-29 07:20:49 CST main: Perform apply using dcli on all systems in all_group: enkdb01 enkdb02 enkcel01 enkcel02 enkcel03
2011-12-29 07:20:49 CST main: Verifying SSH setup and free space for all systems
2011-12-29 07:20:49 CST main: SSH validation for enkdb01 passed
2011-12-29 07:20:49 CST main: Free space validation for enkdb01 passed
2011-12-29 07:20:49 CST main: SSH validation for enkdb02 passed
2011-12-29 07:20:49 CST main: Free space validation for enkdb02 passed
2011-12-29 07:20:50 CST main: SSH validation for enkcel01 passed
2011-12-29 07:20:50 CST main: Free space validation for enkcel01 passed
2011-12-29 07:20:50 CST main: SSH validation for enkcel02 passed
2011-12-29 07:20:50 CST main: Free space validation for enkcel02 passed
2011-12-29 07:20:50 CST main: SSH validation for enkcel03 passed
2011-12-29 07:20:50 CST main: Free space validation for enkcel03 passed
2011-12-29 07:20:50 CST main: Create working directory /tmp/patch13517481_122911072049 on all systems
2011-12-29 07:20:50 CST main: Distribute patch files to all systems
2011-12-29 07:20:51 CST main: Execute './install.sh -b ALL apply' on all systems
2011-12-29 07:20:54 CST main: Completed apply on all systems.
2011-12-29 07:20:54 CST main: Screen output captured from all systems and placed in patch13517481.log
2011-12-29 07:20:54 CST main: enkdb01:
2011-12-29 07:20:54 CST main: enkdb01: Additional details in /tmp/patch13517481_122911072049/patch13517481.log
2011-12-29 07:20:54 CST main: enkdb01:
2011-12-29 07:20:54 CST main: enkdb01: Fix for bug 13454147 (NoFlash six months) - apply
2011-12-29 07:20:54 CST main: enkdb01: Fix not applicable to this system
2011-12-29 07:20:54 CST main: enkdb01:
2011-12-29 07:20:54 CST main: enkdb01: Fix for bug 12886507 (IDT switch reset) - apply
2011-12-29 07:20:54 CST main: enkdb01: Fix not applicable to this system
2011-12-29 07:20:54 CST main: enkdb01:
2011-12-29 07:20:54 CST main: enkdb01: Fix for bug 13489445 (NTPD CELLSRV crash) - apply
2011-12-29 07:20:54 CST main: enkdb01: Fix not applicable to this system
2011-12-29 07:20:54 CST main: enkdb01:
2011-12-29 07:20:54 CST main: enkdb01: Fix for bug 13083530 (10GbE shutdown) - apply
2011-12-29 07:20:54 CST main: enkdb01: Fix not applicable to this system
2011-12-29 07:20:54 CST main: enkdb01:
2011-12-29 07:20:54 CST main: enkdb01: Fix for bug 12626126 (IO stall cell reboot) - apply
2011-12-29 07:20:54 CST main: enkdb01: Fix not applicable to this system
2011-12-29 07:20:54 CST main: enkdb01:
2011-12-29 07:20:54 CST main: enkdb02:
2011-12-29 07:20:54 CST main: enkdb02: Additional details in /tmp/patch13517481_122911072049/patch13517481.log
2011-12-29 07:20:54 CST main: enkdb02:
2011-12-29 07:20:54 CST main: enkdb02: Fix for bug 13454147 (NoFlash six months) - apply
2011-12-29 07:20:54 CST main: enkdb02: Fix not applicable to this system
2011-12-29 07:20:54 CST main: enkdb02:
2011-12-29 07:20:54 CST main: enkdb02: Fix for bug 12886507 (IDT switch reset) - apply
2011-12-29 07:20:54 CST main: enkdb02: Fix not applicable to this system
2011-12-29 07:20:54 CST main: enkdb02:
2011-12-29 07:20:54 CST main: enkdb02: Fix for bug 13489445 (NTPD CELLSRV crash) - apply
2011-12-29 07:20:54 CST main: enkdb02: Fix not applicable to this system
2011-12-29 07:20:54 CST main: enkdb02:
2011-12-29 07:20:54 CST main: enkdb02: Fix for bug 13083530 (10GbE shutdown) - apply
2011-12-29 07:20:54 CST main: enkdb02: Fix not applicable to this system
2011-12-29 07:20:54 CST main: enkdb02:
2011-12-29 07:20:54 CST main: enkdb02: Fix for bug 12626126 (IO stall cell reboot) - apply
2011-12-29 07:20:54 CST main: enkdb02: Fix not applicable to this system
2011-12-29 07:20:54 CST main: enkdb02:
2011-12-29 07:20:54 CST main: enkcel01:
2011-12-29 07:20:54 CST main: enkcel01: Additional details in /tmp/patch13517481_122911072049/patch13517481.log
2011-12-29 07:20:54 CST main: enkcel01:
2011-12-29 07:20:54 CST main: enkcel01: Fix for bug 13454147 (NoFlash six months) - apply
2011-12-29 07:20:54 CST main: enkcel01: Fix for bug 13454147 - apply SUCCESS
2011-12-29 07:20:54 CST main: enkcel01:
2011-12-29 07:20:54 CST main: enkcel01: Fix for bug 12886507 (IDT switch reset) - apply
2011-12-29 07:20:54 CST main: enkcel01: Fix not needed for Exadata version 11.2.2.4.0.110929 - no action taken
2011-12-29 07:20:54 CST main: enkcel01:
2011-12-29 07:20:54 CST main: enkcel01: Fix for bug 13489445 (NTPD CELLSRV crash) - apply
2011-12-29 07:20:54 CST main: enkcel01: ACTION REQUIRED - Fix for bug 13489445 applied and will become active at next CELLSRV restart
2011-12-29 07:20:54 CST main: enkcel01: Fix for bug 13489445 - apply SUCCESS
2011-12-29 07:20:54 CST main: enkcel01:
2011-12-29 07:20:54 CST main: enkcel01: Fix for bug 13083530 (10GbE shutdown) - apply
2011-12-29 07:20:54 CST main: enkcel01: Fix not applicable to this system
2011-12-29 07:20:54 CST main: enkcel01:
2011-12-29 07:20:54 CST main: enkcel01: Fix for bug 12626126 (IO stall cell reboot) - apply
2011-12-29 07:20:54 CST main: enkcel01: Fix not needed for Exadata version 11.2.2.4.0.110929
2011-12-29 07:20:54 CST main: enkcel01:
2011-12-29 07:20:54 CST main: enkcel02:
2011-12-29 07:20:54 CST main: enkcel02: Additional details in /tmp/patch13517481_122911072049/patch13517481.log
2011-12-29 07:20:54 CST main: enkcel02:
2011-12-29 07:20:54 CST main: enkcel02: Fix for bug 13454147 (NoFlash six months) - apply
2011-12-29 07:20:54 CST main: enkcel02: Fix for bug 13454147 - apply SUCCESS
2011-12-29 07:20:54 CST main: enkcel02:
2011-12-29 07:20:54 CST main: enkcel02: Fix for bug 12886507 (IDT switch reset) - apply
2011-12-29 07:20:54 CST main: enkcel02: Fix not needed for Exadata version 11.2.2.4.0.110929 - no action taken
2011-12-29 07:20:54 CST main: enkcel02:
2011-12-29 07:20:54 CST main: enkcel02: Fix for bug 13489445 (NTPD CELLSRV crash) - apply
2011-12-29 07:20:54 CST main: enkcel02: ACTION REQUIRED - Fix for bug 13489445 applied and will become active at next CELLSRV restart
2011-12-29 07:20:54 CST main: enkcel02: Fix for bug 13489445 - apply SUCCESS
2011-12-29 07:20:54 CST main: enkcel02:
2011-12-29 07:20:54 CST main: enkcel02: Fix for bug 13083530 (10GbE shutdown) - apply
2011-12-29 07:20:54 CST main: enkcel02: Fix not applicable to this system
2011-12-29 07:20:54 CST main: enkcel02:
2011-12-29 07:20:54 CST main: enkcel02: Fix for bug 12626126 (IO stall cell reboot) - apply
2011-12-29 07:20:54 CST main: enkcel02: Fix not needed for Exadata version 11.2.2.4.0.110929
2011-12-29 07:20:54 CST main: enkcel02:
2011-12-29 07:20:54 CST main: enkcel03:
2011-12-29 07:20:54 CST main: enkcel03: Additional details in /tmp/patch13517481_122911072049/patch13517481.log
2011-12-29 07:20:54 CST main: enkcel03:
2011-12-29 07:20:54 CST main: enkcel03: Fix for bug 13454147 (NoFlash six months) - apply
2011-12-29 07:20:54 CST main: enkcel03: Fix for bug 13454147 - apply SUCCESS
2011-12-29 07:20:54 CST main: enkcel03:
2011-12-29 07:20:54 CST main: enkcel03: Fix for bug 12886507 (IDT switch reset) - apply
2011-12-29 07:20:54 CST main: enkcel03: Fix not needed for Exadata version 11.2.2.4.0.110929 - no action taken
2011-12-29 07:20:54 CST main: enkcel03:
2011-12-29 07:20:54 CST main: enkcel03: Fix for bug 13489445 (NTPD CELLSRV crash) - apply
2011-12-29 07:20:54 CST main: enkcel03: ACTION REQUIRED - Fix for bug 13489445 applied and will become active at next CELLSRV restart
2011-12-29 07:20:54 CST main: enkcel03: Fix for bug 13489445 - apply SUCCESS
2011-12-29 07:20:54 CST main: enkcel03:
2011-12-29 07:20:54 CST main: enkcel03: Fix for bug 13083530 (10GbE shutdown) - apply
2011-12-29 07:20:54 CST main: enkcel03: Fix not applicable to this system
2011-12-29 07:20:54 CST main: enkcel03:
2011-12-29 07:20:54 CST main: enkcel03: Fix for bug 12626126 (IO stall cell reboot) - apply
2011-12-29 07:20:54 CST main: enkcel03: Fix not needed for Exadata version 11.2.2.4.0.110929
2011-12-29 07:20:54 CST main: enkcel03:
2011-12-29 07:20:54 CST main: Log files from all systems collected and placed in the current directory /u01/stage/patches/11.2.2.4.1_supplemental/13517481
2011-12-29 07:20:54 CST main: Log file enkdb01_patch13517481.log from enkdb01
2011-12-29 07:20:54 CST main: Log file enkdb02_patch13517481.log from enkdb02
2011-12-29 07:20:54 CST main: Log file enkcel01_patch13517481.log from enkcel01
2011-12-29 07:20:54 CST main: Log file enkcel02_patch13517481.log from enkcel02
2011-12-29 07:20:54 CST main: Log file enkcel03_patch13517481.log from enkcel03
2011-12-29 07:20:54 CST main: Remove working directory /tmp/patch13517481_122911072049 on all systems
2011-12-29 07:20:54 CST main: Exiting
2011-12-29 07:20:54 CST main: ====================================================================
Note that the NTP fix will not be available until the next cellsrv restart. This is another critical bug, so do not forget to bounce cellsrv sometime after applying the patch. Remember that cellsrv bounces can be done in a rolling fashion, but should probably be scheduled for a window where the system activity is low.