Exadata 11.2.3.2.1 NFS Issues – Ksplice Support for Exadata?
When the 11.2.3.2.1 release of the Exadata Storage Server software was released, I was a little excited. There were numerous oneoff patches for the previous release, 11.2.3.2.0, which was the first version to support the Exadata X3, writeback flashcache, run UEK on the X#-2 systems, etc. With that many large changes introduced in one version, it was likely to see some bugs in the .0 release. Fortunately, Oracle was quick to fix many of those issues, but it resulted in several separate patches to update the cellsrv software.
I was working with a colleague last week where we ready to apply this patch to a customer's Exadata system. Everything went off without a hitch - upgrading from 11.2.2.4.2 straight to 11.2.3.2.1. We even applied the patch to the customer's quarter rack in rolling mode, which took under 6 hours to complete. After everything was back up and running, we took an archive log backup using RMAN. For this customer, we back everything up to NFS because it won't fit within the FRA, and they don't want to leave backups inside the production system. We were greeted with a strange error when we tried to kick off the backup job in RMAN:
RMAN> run {
2> ALLOCATE CHANNEL DISK1 DEVICE TYPE DISK;
3> BACKUP DATABASE FORMAT '/mnt/nfs/actest_%U';
4> RELEASE CHANNEL DISK1;
5> }
using target database control file instead of recovery catalog
allocated channel: DISK1
channel DISK1: SID=397 instance=ACTEST1 device type=DISK
Starting backup at 13-02-28 21:38
channel DISK1: starting full datafile backup set
channel DISK1: specifying datafile(s) in backup set
input datafile file number=00007 name=+DATA/actest/datafile/tanel_bigfile.325.808412931
input datafile file number=00006 name=+DATA/actest/datafile/ts_data.380.779860027
input datafile file number=00001 name=+DATA/actest/datafile/system.367.779029515
input datafile file number=00002 name=+DATA/actest/datafile/sysaux.368.779029555
input datafile file number=00003 name=+DATA/actest/datafile/undotbs1.369.779029595
input datafile file number=00004 name=+DATA/actest/datafile/undotbs2.371.779029649
input datafile file number=00005 name=+DATA/actest/datafile/users.372.779029687
channel DISK1: starting piece 1 at 13-02-28 21:38
released channel: DISK1
RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03009: failure of backup command on DISK1 channel at 02/28/2013 21:38:37
ORA-19504: failed to create file "/mnt/nfs/actest_1jo34pas_1_1"
ORA-27044: unable to write the header block of file
Linux-x86_64 Error: 12: Cannot allocate memory
Additional information: 3 |
It didn't matter what we were trying to back up, just that it was going to NFS. This backup job had worked fine prior to the patch (we took a backup immediately preceding the maintenance window), but we had applied both a database bundle patch (this database was 11.2.0.2) and the latest storage server patch (11.2.3.2.1), which updates the Linux OS to OEL 5.8, as well as introduces the Oracle Unbreakable Enterprise Kernel into the mix.
We checked the mount options to make sure that everything was ok, and saw that it was:
[enkdb01:oracle:ACTEST1] /u01/app/oracle/product/11.2.0.3/dbhome_2/rdbms/lib > mount | grep "/mnt/nfs" 192.168.12.22:/export/nfs on /mnt/nfs type nfs (rw,bg,hard,nointr,rsize=32768,wsize=32768,tcp,nfsvers=3,timeo=600,actimeo=0,addr=192.168.12.22) |
After poking around a bit, we opened a service request, which was answered pretty quickly by Oracle support. It turns out that there is a known bug with the NFS driver included in the version of the UEK packaged with 11.2.3.2.1. Oracle provided 3 possible fixes, which I'll detail below. The fixes were:
Exadata Flash Write-back – Sooner Than We Think?
If you missed Andy Mendelsohn's keynote at E4 last week, you may not have heard the hubbub that surrounded one of his last slides (tweeted by Frits Hooogland here). The mention of the write-back enticed Kevin Closson to talk about the potential ramifications of such a feature. There's a lot of information on that slide to digest (what's a pluggable database? virtualization of database servers?), but I'm going to focus on the flash-based write-back cache. Note that this is not mentioning the "Exadata Smart Flash Log" featured introduced last year with the 11.2.2.4.0 cell patch, discussed by Guy Harrison recently. That feature sends writes to both flash and disk at the same time. In my experience, the disk wins on > 90% of those writes.
This is something larger than just sending writes to flash...an issue that Oracle has likely been working on for a few years. Kevin had mentioned in his post that he expected it to be a feature in the 12.2 release, possibly 12.1 of the database. Because Mendelsohn mentioned that there was a 12-month timeframe for these items, I expected it would occur with the release of the new version of the Oracle database, 12c. I've been doing some poking around in the latest Exadata patch notes and saw a couple of interesting bugs around a write-back cache on Exadata using flash. Bug 14143451 "Enhancement for ASM write-back flash cache resilvering support" and bug 14132953 "Enhanacement to add Write-back flash cache resilvering support" have both been added to the August 2012 bundle patch for 11.2.0.3 (MOS note #1393410.1). If you look at these bugs, you will see that they are currently listed as fixed in 11.2.0.4. The fact that the enhancement has been added to 11.2.0.3 interests me. It looks similar to the introduction of the Exadata smart flash log feature, introduced in the 11.2.2.4.0 Exadata storage server version, released October 2011. If you look through the Exadata bundle patches for 11.2.0.2, you'll see that it was introduced into the database code in bundle patch 9 (MOS note #1314319.1). That bundle patch was released in July 2011. Sound familiar? I wouldn't put it past Oracle to include the write-back cache through a new version of the storage server software.
This sounds like the kind of feature that Larry Ellison would be very happy to announce at Open World in October. We'll just have to wait and see what gets announced. I'll have another post in the next week or so guessing about what may get announced a month from now in San Francisco.
