Oracle-Ninja.com Andy Colvin's Oracle Blog

7Mar/139

Exadata 11.2.3.2.1 NFS Issues – Ksplice Support for Exadata?

When the 11.2.3.2.1 release of the Exadata Storage Server software was released, I was a little excited.  There were numerous oneoff patches for the previous release, 11.2.3.2.0, which was the first version to support the Exadata X3, writeback flashcache, run UEK on the X#-2 systems, etc.  With that many large changes introduced in one version, it was likely to see some bugs in the .0 release.  Fortunately, Oracle was quick to fix many of those issues, but it resulted in several separate patches to update the cellsrv software.

I was working with a colleague last week where we ready to apply this patch to a customer's Exadata system.  Everything went off without a hitch - upgrading from 11.2.2.4.2 straight to 11.2.3.2.1.  We even applied the patch to the customer's quarter rack in rolling mode, which took under 6 hours to complete.  After everything was back up and running, we took an archive log backup using RMAN.  For this customer, we back everything up to NFS because it won't fit within the FRA, and they don't want to leave backups inside the production system.  We were greeted with a strange error when we tried to kick off the backup job in RMAN:

RMAN> run {
2>   ALLOCATE CHANNEL DISK1 DEVICE TYPE DISK;
3>   BACKUP DATABASE FORMAT '/mnt/nfs/actest_%U';
4>   RELEASE CHANNEL DISK1;
5> }
 
using target database control file instead of recovery catalog
allocated channel: DISK1
channel DISK1: SID=397 instance=ACTEST1 device type=DISK
 
Starting backup at 13-02-28 21:38
channel DISK1: starting full datafile backup set
channel DISK1: specifying datafile(s) in backup set
input datafile file number=00007 name=+DATA/actest/datafile/tanel_bigfile.325.808412931
input datafile file number=00006 name=+DATA/actest/datafile/ts_data.380.779860027
input datafile file number=00001 name=+DATA/actest/datafile/system.367.779029515
input datafile file number=00002 name=+DATA/actest/datafile/sysaux.368.779029555
input datafile file number=00003 name=+DATA/actest/datafile/undotbs1.369.779029595
input datafile file number=00004 name=+DATA/actest/datafile/undotbs2.371.779029649
input datafile file number=00005 name=+DATA/actest/datafile/users.372.779029687
channel DISK1: starting piece 1 at 13-02-28 21:38
released channel: DISK1
RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03009: failure of backup command on DISK1 channel at 02/28/2013 21:38:37
ORA-19504: failed to create file "/mnt/nfs/actest_1jo34pas_1_1"
ORA-27044: unable to write the header block of file
Linux-x86_64 Error: 12: Cannot allocate memory
Additional information: 3

It didn't matter what we were trying to back up, just that it was going to NFS.  This backup job had worked fine prior to the patch (we took a backup immediately preceding the maintenance window), but we had applied both a database bundle patch (this database was 11.2.0.2) and the latest storage server patch (11.2.3.2.1), which updates the Linux OS to OEL 5.8, as well as introduces the Oracle Unbreakable Enterprise Kernel into the mix.

We checked the mount options to make sure that everything was ok, and saw that it was:

[enkdb01:oracle:ACTEST1] /u01/app/oracle/product/11.2.0.3/dbhome_2/rdbms/lib 
> mount | grep "/mnt/nfs"
192.168.12.22:/export/nfs on /mnt/nfs type nfs (rw,bg,hard,nointr,rsize=32768,wsize=32768,tcp,nfsvers=3,timeo=600,actimeo=0,addr=192.168.12.22)

After poking around a bit, we opened a service request, which was answered pretty quickly by Oracle support.  It turns out that there is a known bug with the NFS driver included in the version of the UEK packaged with 11.2.3.2.1. Oracle provided 3 possible fixes, which I'll detail below. The fixes were:

1Feb/130

2013 Presentation Schedule

Well, it's been a really busy past few months, and I hate to admit it, but I've been neglecting this space more than anything. Despite having many different posts in the works, nothing is quite finished yet. I do have a little time to mention a few of my upcoming speaking events, though.

First, I'll be in Denver for the Rocky Mountain Oracle Users Group Training Days 2013, February 11-13.  Enkitec has more than a few sessions during the conference, ranging from Exadata to Big Data to APEX.  Check out the agenda for a full list.  Also, we'll have an Enkitec booth there, which is the place that I'll be hanging out when I'm not attending the numerous interesting sessions.  I have 2 presentations during the week:

I'm really looking forward to the RMAN session, as I'll finally be able to talk about some of the new stuff that's coming up in 12c.

I'm also going to be presenting a couple of Expert Seminars for Oracle University in the coming months.  I'll be presenting a "Getting Ready for Oracle Exadata" seminar over 2 half days in March (March 19-20) and May (May 28-29).  I'll be talking about topics like how to functionally manage your Exadata environment (backups, monitoring, patching, etc).  The sessions will be entirely online, and will have plenty of time for Q&A to ask any nagging questions.

Finally, I would like to mention that the call for papers is open for Enkitec's Engineered Systems conference, E4.  Once again, it will be held at the Four Seasons Las Colinas in August.  I know it's not the best time to come to the Dallas area, but they have great air conditioning, and it's nearly impossible to get this kind of access to some of the best Exadata professionals in the world.  The list of speakers from last year was incredible, and I'm sure that this year will be just as good.  Also, the attendees were great to talk to, since many of them were there to share their experiences running Exadata.  The call for papers is open, so if you have something you'd like to share, please submit your abstract!  We'll be covering more than just Exadata, with plans for several talks about Big Data, and possibly even an Exalogic or Exalytics session or two.

1Oct/120

Oracle Announces Exadata X3-2 and X3-8

Well, it's finally public, so we're able to openly talk about the new Exadata X3 systems.  Looking back on my pre-openworld predictions, I was pretty close on a few things.  I was correct on the database servers, which will have Xeon E5-2690 CPUs (8 core, 2.9GHz) with 128GB RAM upgradeable to 256GB.  It looks like we won't get active/active Infiniband for a while, since the cards in there are staying the same.  On the X3-8, the compute nodes are staying the same, for reasons detailed by Kevin Closson a few weeks ago.  I also previously blogged about the X3-2 eighth rack.  I think this will become one of the more popular options for customers, based on the quarter racks that we're seeing purchased.  I'm definitely interested to get my hands on one and see how half of the components have been disabled.  It's very cool that Oracle was able to still give the redundancy of a true Exadata in a smaller footprint.

One of the bigger improvements on the X3 series comes down at the storage level.  I was a little bit off on the CPUS, which will be E5-2630L (6 core, 2.0GHz) with an upgrade from 24GB to 64GB of RAM.  The biggest differences on the storage servers will come via the F40 flash cards, which increase storage 4x (400GB per card), meaning that you'll get 1.6TB of flash per cell.  Also, the version of the Exadata storage server software shipping with the X3 systems will be 11.2.3.2.0, which contains the famous "flash for all writes" cache.  Disk drives will stay the same (600GB or 3TB).

The new storage server software (11.2.3.2.0) should be released to the public some time this week, and it will include the flash write cache for previous systems.  I'm very interested to see what the performance of this feature will look like on the older X2 and V2 systems, where the flash cards are a little bit slower at writes than the new F40 cards.  It is worth noting that the write cache feature will be something that users can enable or disable, so if the performance is not what's expected, it can be disabled.  Rest assured that once the patch is released, it'll find its way onto one of Enkitec's Exadata shortly thereafter.

Also, this new storage server software release will introduce Oracle's Unbreakable Enterprise Kernel to the 2-socket Exadata crowd.  The UEK has been available for the X2-8 systems since their release, but Oracle had yet to run it on X2 systems.  This will change with the release of 11.2.3.2.0.  It is worth noting that it is still possible to go back to the RedHat compatible kernel if there is adverse performance on the UEK.

That's it for now, and as new things come up during the week, I'll try to post on here.

7Sep/124

Exadata X3-2 1/8th Rack

There have been a couple of posts we've seen lately about expectations of an Exadata X3-2 and X3-8 release at Oracle Open World 2012.  I mentioned in my previous post about the possible release of an X3-2 1/8th rack configuration.  I had guessed that this would be similar to the old V2 basic system that would include one compute node, one storage server, and one infiniband switch - all placed in your own rack.  It sounds like I was a little bit off from this original idea.

Oracle has stopped taking orders on X2-2 and X2-8 hardware, and we have had a handful of our customers let us know about emails that they have received from Oracle reps announcing an Exadata X3-2 1/8th rack for sale.  This configuration will work as "capacity on demand" (insert salesy buzz words).  The plan for the Exadata X3-2 1/8th rack is to contain all of the hardware that exists within a 1/4 rack configuration (2 compute nodes, 3 storage servers, 2 infiniband switches), but to disable half of the CPU cores, half of the flash cards, and half of the hard disks via software controls.

Here's what I would expect this to look like:

  • Compute Nodes
    • 8 CPU cores (16 threads)
    • 128GB RAM
  • Storage Servers
    • 6 or 8 CPU cores (12 or 16 threads)
    • 2 PCIe flash cards
    • 6 X 600GB SAS or 3TB SAS hard disks
  • 2 Infiniband Switches

This would leave you with either 10Tb or 54TB of raw disk space depending on whether high performance or high capacity drives were chosen.  The CPU cores and other hardware components would be disabled using software, probably similar to how unlicensed CPU cores in an ODA are disabled.  This would mean that the 1/8th rack configuration would still contain RAC (including RAC licenses), multiple storage servers (only half of the Exadata storage server licenses), and lots of flash cache.  The process from upgrading a 1/8 rack to a 1/4 rack system would simply be a matter of enabling the extra hardware, most likely through a license key.  Based on the increase in CPU/memory/flash that I'm expecting to see from the X2 --> X3 release, I would expect to see quite a few customers looking at Exadata as an option for many hardware refresh upgrades.  It will be really nice to actually test the improvements from the flash write cache that should be announced at Open World as well.

31Aug/123

Pre OpenWorld Predictions (Exadata X3-2?)

With only a month away from Larry Ellison's keynote at Oracle OpenWorld 2012, I thought that I would make a couple of wild guesses about new products that may or may not get announced this year.  I'll lump them into a few educated guesses and wild conjecture.  Insert standard blogging disclaimer (please read this part, Oracle lawyers):

Everything contained in this blog post is pulled from publicly available information and conclusions drawn from products that are currently available outside of Exadata.  None of this information comes from within Oracle - not that Oracle would be willing to give me any information otherwise.

25Aug/120

Exadata Flash Write-back – Sooner Than We Think?

If you missed Andy Mendelsohn's keynote at E4 last week, you may not have heard the hubbub that surrounded one of his last slides (tweeted by Frits Hooogland here).  The mention of the write-back enticed Kevin Closson to talk about the potential ramifications of such a feature.  There's a lot of information on that slide to digest (what's a pluggable database?  virtualization of database servers?), but I'm going to focus on the flash-based write-back cache.  Note that this is not mentioning the "Exadata Smart Flash Log" featured introduced last year with the 11.2.2.4.0 cell patch, discussed by Guy Harrison recently.  That feature sends writes to both flash and disk at the same time.  In my experience, the disk wins on > 90% of those writes.

This is something larger than just sending writes to flash...an issue that Oracle has likely been working on for a few years. Kevin had mentioned in his post that he expected it to be a feature in the 12.2 release, possibly 12.1 of the database. Because Mendelsohn mentioned that there was a 12-month timeframe for these items, I expected it would occur with the release of the new version of the Oracle database, 12c. I've been doing some poking around in the latest Exadata patch notes and saw a couple of interesting bugs around a write-back cache on Exadata using flash. Bug 14143451 "Enhancement for ASM write-back flash cache resilvering support" and bug 14132953 "Enhanacement to add Write-back flash cache resilvering support" have both been added to the August 2012 bundle patch for 11.2.0.3 (MOS note #1393410.1). If you look at these bugs, you will see that they are currently listed as fixed in 11.2.0.4. The fact that the enhancement has been added to 11.2.0.3 interests me. It looks similar to the introduction of the Exadata smart flash log feature, introduced in the 11.2.2.4.0 Exadata storage server version, released October 2011. If you look through the Exadata bundle patches for 11.2.0.2, you'll see that it was introduced into the database code in bundle patch 9 (MOS note #1314319.1). That bundle patch was released in July 2011. Sound familiar? I wouldn't put it past Oracle to include the write-back cache through a new version of the storage server software.

This sounds like the kind of feature that Larry Ellison would be very happy to announce at Open World in October. We'll just have to wait and see what gets announced. I'll have another post in the next week or so guessing about what may get announced a month from now in San Francisco.

2Mar/128

Exadata V2 Battery Replacement

For some reason, I've been working on lots of Exadata V2 systems in the past few months.  One of the issues that I've been coming across for these clients is a failure in the battery that is used by the RAID controller.  It was originally expected for these batteries to last 2 years.  Unfortunately, there is a defect in the batteries where they reach their end of life after approximately 18 months.  The local Sun reps should have access to a schedule that says when the "regular maintenance" should occur.  For one client, it wasn't caught until the batteries had run down completely and the disks were in WriteThrough mode.  This can be seen by running MegaCLI64.  Here is the output to check the WriteBack/WriteThrough status for 2 different compute nodes (V2 is first, X2-2 is second):

[enkdb01:root] /root
> dmidecode -s system-product-name
SUN FIRE X4170 SERVER          
 
[enkdb01:root] /root
> /opt/MegaRAID/MegaCli/MegaCli64 -LDInfo -LALL -aALL | grep "Cache Policy"
Default Cache Policy: WriteBack, ReadAheadNone, Direct, No Write Cache if Bad BBU
Current Cache Policy: WriteThrough, ReadAheadNone, Direct, No Write Cache if Bad BBU
Disk Cache Policy   : Disabled
[root@enkdb03 ~]# dmidecode -s system-product-name
SUN FIRE X4170 M2 SERVER
[root@enkdb03 ~]# /opt/MegaRAID/MegaCli/MegaCli64 -LDInfo -LALL -aALL | grep "Cache Policy"
Default Cache Policy: WriteBack, ReadAheadNone, Direct, No Write Cache if Bad BBU
Current Cache Policy: WriteBack, ReadAheadNone, Direct, No Write Cache if Bad BBU
Disk Cache Policy   : Disabled

If you have a V2 and you haven't replaced the batteries yet, it's worth running these commands to see what state your RAID controllers are in.  To find out what this means for you, read on after the break.

18Jan/120

New Exadata Full Stack Patches

Oracle has announced a new patching strategy for Exadata, starting with databases running 11.2.0.3.  Oracle will be moving away from the monthly bundle patch philosophy, which was panned by many administrators as coming too often to keep up with, given the tight schedules held around most Exadata systems.  Instead, Oracle will be releasing a Quarterly Database Patch for Exadata, or QDPE.  The QDPE will most likely be released in conjunction with the standard critical patch updates (CPUs).  Oracle will still release interim bundle patches, but recommends for customers to only install the QDPEs unless they have a specific need to install a bundle patch.  Note that so far, the QDPEs are only being released for 11.2.0.3 - Linux x86_64, SPARC Solaris (supercluster), and Solaris x86_64.

In addition to the QDPE release, Oracle has announced a "full stack QDPE" - the Quarterly Full Stack Download Patch, or QFSDP.  This "full stack" patch includes all of the latest software that can be found in MOS note #888828.1.  The January 2012 QFSDP includes:

  • Infrastructure Software
    • Exadata Storage Server version 11.2.2.4.2
    • Exadata Infiniband Switch version 1.3.3-2
    • Exadata PDU firmware version 1.04
  • Database
    • 11.2.0.3 January 2012 QDPE
    • Opatch 11.2.0.1.9
    • OPlan
  • Systems Management
    • Patches for 11g OEM agents
    • Management plugins for 11g OEM
    • Patches for 11g OEM management server

No word on when Oracle will start including patches for the new OEM 12c.  Keep in mind that these are just a collection of patches, they all still need to be installed as if they were downloaded separately.  Oracle does not yet have a mechanism in place to apply the QDPE, storage server patches, Infiniband switch patches, etc in one swoop.

The current QDPE patch is January 2012 (patch #13513783), and the current DFSDP is January 2012 (patch #13551280).

6Jan/126

Voting Disk Redundancy in ASM

A recent discussion thread on the OTN Exadata forum made me want to test a feature of 11.2 RAC - voting disk redundancy.  There was one section of the Clusterware Adminsitration and Deployment Guide (http://goo.gl/eMrQM) that made me want to test it out to see how well it worked:

If voting disks are stored on Oracle ASM with normal or high redundancy, and the storage hardware in one failure group suffers a failure, then if there is another disk available in a disk group in an unaffected failure group, Oracle ASM recovers the voting disk in the unaffected failure group.

 
How resilient is it?  How quickly will the cluster recover the voting disk?  Do we have to wait for the ASM rebalance timer to tick down to zero before the voting disk gets recreated?

First, what is a voting disk?  The voting disks are used to determine which nodes are members of the cluster.  In a RAC configuration, there are either 1, 3, or 5 voting disks for the cluster.

Next, when voting disks are placed into an ASM diskgroup, which is the default in 11.2, the number of voting disks that are created depend on the redundancy level of the diskgroup.  For external redundancy, 1 voting disk is created.  When running ASM redundancy, voting disks have a higher requirement for the number of disks.  Normal redundancy creates 3 voting disks, and high redundancy creates 5.  Oracle recommends at least 300MB per voting disk file, and 300MB for each copy of the OCR.  On standard (non-Exadata) RAC builds, I prefer to place OCR and Voting disks into their own diskgroup, named GRID or SYSTEM.

Anyway, back to the fun stuff.  I'm testing this on Exadata, but the results should carry over to any other system running 11.2 RAC.  For starters, we have a diskgroup named DBFS_DG that is used to hold the OCR and voting disks.  We can see that by running the lsdg command in asmcmd. I've abbreviated the output for clarity.

[enkdb03:oracle:+ASM1] /u01/app/11.2.0.2/grid
> asmcmd
ASMCMD> lsdg
State    Type    Rebal  Total_MB   Free_MB  Usable_file_MB  Offline_disks  Voting_files  Name
MOUNTED  NORMAL  N      65261568  49688724        21877927              0             N  DATA/
MOUNTED  NORMAL  N       1192320    978292          434950              0             Y  DBFS_DG/
MOUNTED  HIGH    N      22935248  20570800         5466918              0             N  RECO/
ASMCMD>

The DBFS_DG diskgroup has 4 failgroups - ENKCEL04, ENKCEL05, ENKCEL06, and ENKCEL07:

SYS:+ASM1> select distinct g.name "Diskgroup",
  2    d.failgroup "Failgroup"
  3  from v$asm_diskgroup g,
  4    v$asm_disk d
  5  where g.group_number = d.group_number
  6  and g.NAME = 'DBFS_DG'
  7  /
 
Diskgroup		       Failgroup
------------------------------ ------------------------------
DBFS_DG 		       ENKCEL04
DBFS_DG 		       ENKCEL05
DBFS_DG 		       ENKCEL06
DBFS_DG 		       ENKCEL07
 
SYS:+ASM1>

Finally, our current voting disks reside in failgroups ENKCEL05, ENKCEL06, and ENKCEL07:

[enkdb03:oracle:+ASM1] /home/oracle
> sudo /u01/app/11.2.0.2/grid/bin/crsctl query css votedisk
##  STATE    File Universal Id                File Name Disk group
--  -----    -----------------                --------- ---------
 1. ONLINE   43164b9cc7234fe1bff4eb968ec4a1dc (o/192.168.12.10/DBFS_DG_CD_02_enkcel06) [DBFS_DG]
 2. ONLINE   2e6db5ba5fd34fc8bfaa0ab8b9d0ddf5 (o/192.168.12.11/DBFS_DG_CD_03_enkcel07) [DBFS_DG]
 3. ONLINE   95eb38e5dfc34f3ebfd72127e7fe9c12 (o/192.168.12.9/DBFS_DG_CD_02_enkcel05) [DBFS_DG]
Located 3 voting disk(s).

Going back to the original questions, what does it take for CRS to notice that a voting disk is gone, and how quickly will it be replaced? Are they handled in the same way as normal disks, or some other way? When an ASM disk goes offline, ASM waits the amount of time listed in the disk_repair_time attribute for the diskgroup (default is 3.6 hours) before dropping the disk and performing a rebalance. Let's offline one of the failgroups that has a voting disk and find out.

SYS:+ASM1> !date
Fri Jan  6 12:57:18 CST 2012
 
SYS:+ASM1> alter diskgroup dbfs_dg offline disks in failgroup ENKCEL05;
 
Diskgroup altered.
 
SYS:+ASM1> !date
Fri Jan  6 12:57:46 CST 2012
 
SYS:+ASM1> !sudo /u01/app/11.2.0.2/grid/bin/crsctl query css votedisk
[sudo] password for oracle:
##  STATE    File Universal Id                File Name Disk group
--  -----    -----------------                --------- ---------
 1. ONLINE   43164b9cc7234fe1bff4eb968ec4a1dc (o/192.168.12.10/DBFS_DG_CD_02_enkcel06) [DBFS_DG]
 2. ONLINE   2e6db5ba5fd34fc8bfaa0ab8b9d0ddf5 (o/192.168.12.11/DBFS_DG_CD_03_enkcel07) [DBFS_DG]
 3. ONLINE   ab06b75d54764f79bfd4ba032b317460 (o/192.168.12.8/DBFS_DG_CD_02_enkcel04) [DBFS_DG]
Located 3 voting disk(s).
 
SYS:+ASM1> !date
Fri Jan  6 12:58:03 CST 2012

That was quick! So from this exercise, we can see that ASM doesn't waste any time to recreate the voting disk. With files that are this critical to the stability of the cluster, it's not hard to see why. If we dig further, we can see that ocssd noticed that the voting disk on o/192.168.12.9/DBFS_DG_CD_02_enkcel05 was offline, and it proceeded to create a new file on disk o/192.168.12.8/DBFS_DG_CD_02_enkcel04:

2012-01-06 12:57:37.075
[cssd(10880)]CRS-1605:CSSD voting file is online: o/192.168.12.8/DBFS_DG_CD_02_enkcel04; details in /u01/app/11.2.0.2/grid/log/enkdb03/cssd/ocssd.log.
2012-01-06 12:57:37.075
[cssd(10880)]CRS-1604:CSSD voting file is offline: o/192.168.12.9/DBFS_DG_CD_02_enkcel05; details at (:CSSNM00069:) in /u01/app/11.2.0.2/grid/log/enkdb03/cssd/ocssd.log.

Finally, what happens when the failgroup comes back online? There are no changes made to the voting disks. The voting disk from the failgroup that was previously offline is still there, but will be overwritten if that space is needed:

SYS:+ASM1> !date
Fri Jan  6 13:05:09 CST 2012
 
SYS:+ASM1> alter diskgroup dbfs_dg online disks in failgroup ENKCEL05;
 
Diskgroup altered.
 
SYS:+ASM1> !date
Fri Jan  6 13:06:31 CST 2012
 
SYS:+ASM1> !sudo /u01/app/11.2.0.2/grid/bin/crsctl query css votedisk
[sudo] password for oracle:
##  STATE    File Universal Id                File Name Disk group
--  -----    -----------------                --------- ---------
 1. ONLINE   43164b9cc7234fe1bff4eb968ec4a1dc (o/192.168.12.10/DBFS_DG_CD_02_enkcel06) [DBFS_DG]
 2. ONLINE   2e6db5ba5fd34fc8bfaa0ab8b9d0ddf5 (o/192.168.12.11/DBFS_DG_CD_03_enkcel07) [DBFS_DG]
 3. ONLINE   ab06b75d54764f79bfd4ba032b317460 (o/192.168.12.8/DBFS_DG_CD_02_enkcel04) [DBFS_DG]
Located 3 voting disk(s).

In previous versions, this automation did not exist. If a voting disk went offline, the DBA had to manually create a new voting disk. Keep in mind that this feature will only take effect if there must be another failgroup available to place the voting disk in. If you only have 3 failgroups for your normal redundancy (or 5 for your high redundancy) OCR/voting diskgroup, this automatic recreation will not occur.

Tagged as: , 6 Comments
29Dec/110

Exadata Critical Patch for 11.2.2.3.x through 11.2.2.4.1

Oracle has released a critical patch for storage server versions 11.2.2.3.x through 11.2.2.4.1.  While 11.2.2.4.1 was released last week, there were a few oneoff patches from 11.2.2.4.0 that didn't seem to make it in to the release.  Oracle has since released 11.2.2.4.2 (patch #13513611, supplemental note #1388400.1).  Similar to 11.2.2.4.1, this release looks to patch several outstanding issues.  Here's the list of bugs fixed from the readme for 11.2.2.4.2:

12764521        INFINIBAND DIAG COMMANDS (LIKE IBDIAGNET AND IBNETDISCOVER) ARE NOT WORKING
13083530        10 GB-E BONDED INTERFACES FAILING- EXADATA
13410353        AFTER UPGRADE TO 11.2.2.4 INFINIBAND CMDS IBDIAGNET, IBNETDISCOVER NOT WORKING
13489032        CHECKHWNFWPROFILE DOES NOT DETECT FAILED FLASH FDOM
13489445        ORA-600 [OSSMISC:OSSMISC_TIMER] WHEN NTPD DETECTED 6 MILLISECOND TIME DIFFERENCE
13512932        FIX INSTALLED WORKAROUND FOR NTP UPDATE BUG 13489445

As you can see, the previously mentioned bugs have been fixed.  There's another bug that was fixed in 11.2.2.4.1 that could be an issue for anybody running 11.2.2.3.x through 11.2.2.4.0.  This bug (13454147) can remove the flashcache from a cell that has an uptime of 6 months or greater.  Fortunately, Oracle has released a patch that includes these critical issues in the event that you can't quickly upgrade to 11.2.2.4.2 - I wouldn't advise running this version for at least a couple weeks...I always advise clients to wait that long for the early adopters to weed out any major issues.

Applying the critical patch only takes a minute, and doesn't take the storage servers or database instances offline.  After it's done, a restart of cellsrv needs to be scheduled, but that can be done in a rolling fashion.  Read on for an example of applying this patch.  As always, do not apply any patch to a production system before appropriately testing against a non-production system!