Oracle-Ninja.com Andy Colvin's Oracle Blog

7May/123

Why I Don’t Like Role Separated Accounts – Part 1

One of the things that I've touched on before that tends to bother me - role separated grid infrastructure installations - gave me another reason to show my dislike a few weeks ago.  While working on a system that was being upgraded from 11.2.0.2 to 11.2.0.3, we ran into a strange issue.  After upgrading from 11.2.0.2 to 11.2.0.3, we could no longer connect to our databases.  When we would attempt to connect remotely, we would get:

[acolvin@homer ~]$ sqlplus system@odademo
 
SQL*Plus: Release 11.2.0.3.0 Production on Sun May 20 11:58:23 2012
 
Copyright (c) 1982, 2011, Oracle.  All rights reserved.
 
Enter password:
ERROR:
ORA-12537: TNS:connection closed

We could connect to the database without issue internally. There were no network issues to report, everything appeared to be working fine, except we couldn't get in to the database.  The database we were connecting to (DEMO) was registered with the listener:

[grid@patty ~]$ lsnrctl stat LISTENER_SCAN1
 
LSNRCTL for Linux: Version 11.2.0.3.0 - Production on 20-MAY-2012 12:07:55
 
Copyright (c) 1991, 2011, Oracle.  All rights reserved.
 
Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=IPC)(KEY=LISTENER_SCAN1)))
STATUS of the LISTENER
------------------------
Alias                     LISTENER_SCAN1
Version                   TNSLSNR for Linux: Version 11.2.0.3.0 - Production
Start Date                25-APR-2012 11:50:38
Uptime                    25 days 0 hr. 17 min. 17 sec
Trace Level               off
Security                  ON: Local OS Authentication
SNMP                      OFF
Listener Parameter File   /u01/app/11.2.0.3/grid/network/admin/listener.ora
Listener Log File         /u01/app/11.2.0.3/grid/log/diag/tnslsnr/patty/listener_scan1/alert/log.xml
Listening Endpoints Summary...
  (DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(KEY=LISTENER_SCAN1)))
  (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=****)(PORT=1521)))
Services Summary...
Service "DEMO" has 2 instance(s).
  Instance "DEMO1", status READY, has 2 handler(s) for this service...
  Instance "DEMO2", status READY, has 2 handler(s) for this service...
 
[grid@patty ~]$ lsnrctl stat
 
LSNRCTL for Linux: Version 11.2.0.3.0 - Production on 20-MAY-2012 12:05:51
 
Copyright (c) 1991, 2011, Oracle.  All rights reserved.
 
Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=IPC)(KEY=LISTENER)))
STATUS of the LISTENER
------------------------
Alias                     LISTENER
Version                   TNSLSNR for Linux: Version 11.2.0.3.0 - Production
Start Date                07-MAY-2012 15:21:43
Uptime                    12 days 20 hr. 44 min. 8 sec
Trace Level               off
Security                  ON: Local OS Authentication
SNMP                      OFF
Listener Parameter File   /u01/app/11.2.0.3/grid/network/admin/listener.ora
Listener Log File         /u01/app/grid/diag/tnslsnr/patty/listener/alert/log.xml
Listening Endpoints Summary...
  (DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(KEY=LISTENER)))
  (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=****)(PORT=1521)))
  (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=****)(PORT=1521)))
Services Summary...
Service "+ASM" has 1 instance(s).
  Instance "+ASM1", status READY, has 1 handler(s) for this service...
Service "DEMO" has 1 instance(s).
  Instance "DEMO1", status READY, has 2 handler(s) for this service...

After poking around for a little bit, I came across MOS note #1069517.1, "ORA-12537 if Listener (including SCAN Listener) and Database are Owned by Different OS User." Hey, that looks familiar! Looking through the listener logs, we saw this error:

20-MAY-2012 12:10:48 * (CONNECT_DATA=(SID=DEMO1)(CID=(PROGRAM=perl@patty)(HOST=patty)(USER=oracle))) * (ADDRESS=(PROTOCOL=tcp)(HOST=192.168.10.219)(PORT=24042)) * establish * DEMO1 * 12518
TNS-12518: TNS:listener could not hand off client connection
 TNS-12547: TNS:lost contact
  TNS-12560: TNS:protocol adapter error
   TNS-00517: Lost contact
    Linux Error: 32: Broken pipe

Turns out this is an issue if the permissions on the oracle binary in the database $ORACLE_HOME/bin directory is missing the setuid bit.

[oracle@patty ~]$ ls -al /u01/app/oracle/product/11.2.0/dbhome_1/bin/oracle
-rwxr-x--x 1 oracle asmadmin 229009338 Jan 19 12:59 /u01/app/oracle/product/11.2.0/dbhome_1/bin/oracle

Once we reset the setuid bit, we were back in business:

[oracle@patty ~]$
chmod 6751 /u01/app/oracle/product/11.2.0/dbhome_1/bin/oracle
 
[oracle@patty ~]$ ls -al /u01/app/oracle/product/11.2.0/dbhome_1/bin/oracle
-rwsr-x--x 1 oracle asmadmin 229009338 Jan 19 12:59 /u01/app/oracle/product/11.2.0/dbhome_1/bin/oracle

While this isn't something that comes up often, it's still something that wouldn't happen under an environment owned entirely by the Oracle account.

8Mar/129

Yum Repository for Oracle Enterprise Linux

One of the new requirements of the Exadata Storage Server patches (starting with 11.2.3.1.0) is that the compute nodes will be patched through yum.  Previously, the minimal pack was bundled with the storage server patch and applied directly to the database servers.  Instead of having the compute node directly receive updates from Oracle's yum repository (the Unbreakable Linux Network - ULN), it is recommended to configure a local yum repository that will download the RPMs from ULN.  After this is done, all local servers will connect to the yum repository to receive RPM patches, saving time and network bandwidth.  This post will describe how to configure the yum repository on Oracle Enterprise Linux (process borrowed from OTN).

Creating a yum repository does not require any additional license other than a ULN subscription (included with Exadata or any OEL support contract) unlike RedHat's Satellite product, which can be quite pricey.  First, you will need a server running Oracle Enterprise Linux.  If you don't have anything running OEL, check out Tim Hall's great site - oracle-base.com for a quick primer on installing OEL.  After you have a server up and running with OEL, you'll need to register it with the ULN.  This requires a CSI - included with an OEL license.  To do this, run:

rpm --import /usr/share/rhn/RPM-GPG-KEY
up2date --nox --register

 

2Mar/128

Exadata V2 Battery Replacement

For some reason, I've been working on lots of Exadata V2 systems in the past few months.  One of the issues that I've been coming across for these clients is a failure in the battery that is used by the RAID controller.  It was originally expected for these batteries to last 2 years.  Unfortunately, there is a defect in the batteries where they reach their end of life after approximately 18 months.  The local Sun reps should have access to a schedule that says when the "regular maintenance" should occur.  For one client, it wasn't caught until the batteries had run down completely and the disks were in WriteThrough mode.  This can be seen by running MegaCLI64.  Here is the output to check the WriteBack/WriteThrough status for 2 different compute nodes (V2 is first, X2-2 is second):

[enkdb01:root] /root
> dmidecode -s system-product-name
SUN FIRE X4170 SERVER          
 
[enkdb01:root] /root
> /opt/MegaRAID/MegaCli/MegaCli64 -LDInfo -LALL -aALL | grep "Cache Policy"
Default Cache Policy: WriteBack, ReadAheadNone, Direct, No Write Cache if Bad BBU
Current Cache Policy: WriteThrough, ReadAheadNone, Direct, No Write Cache if Bad BBU
Disk Cache Policy   : Disabled
[root@enkdb03 ~]# dmidecode -s system-product-name
SUN FIRE X4170 M2 SERVER
[root@enkdb03 ~]# /opt/MegaRAID/MegaCli/MegaCli64 -LDInfo -LALL -aALL | grep "Cache Policy"
Default Cache Policy: WriteBack, ReadAheadNone, Direct, No Write Cache if Bad BBU
Current Cache Policy: WriteBack, ReadAheadNone, Direct, No Write Cache if Bad BBU
Disk Cache Policy   : Disabled

If you have a V2 and you haven't replaced the batteries yet, it's worth running these commands to see what state your RAID controllers are in.  To find out what this means for you, read on after the break.

24Jan/121

Speaking at Miracle Open World 2012

I've been invited to speak at Miracle Open World 2012 in Billund, Denmark!  My topic will be "Oracle Database Appliance Internals," which I'll try to make fun and exciting.  I'm definitely looking forward to the experience, and it'll be great to spend time with the other speakers and attendees.  My session is currently scheduled for Thursday afternoon at 4PM, before the dinner and beach party.  If you're coming to the conference, come by my session and say hi!

18Jan/120

New Exadata Full Stack Patches

Oracle has announced a new patching strategy for Exadata, starting with databases running 11.2.0.3.  Oracle will be moving away from the monthly bundle patch philosophy, which was panned by many administrators as coming too often to keep up with, given the tight schedules held around most Exadata systems.  Instead, Oracle will be releasing a Quarterly Database Patch for Exadata, or QDPE.  The QDPE will most likely be released in conjunction with the standard critical patch updates (CPUs).  Oracle will still release interim bundle patches, but recommends for customers to only install the QDPEs unless they have a specific need to install a bundle patch.  Note that so far, the QDPEs are only being released for 11.2.0.3 - Linux x86_64, SPARC Solaris (supercluster), and Solaris x86_64.

In addition to the QDPE release, Oracle has announced a "full stack QDPE" - the Quarterly Full Stack Download Patch, or QFSDP.  This "full stack" patch includes all of the latest software that can be found in MOS note #888828.1.  The January 2012 QFSDP includes:

  • Infrastructure Software
    • Exadata Storage Server version 11.2.2.4.2
    • Exadata Infiniband Switch version 1.3.3-2
    • Exadata PDU firmware version 1.04
  • Database
    • 11.2.0.3 January 2012 QDPE
    • Opatch 11.2.0.1.9
    • OPlan
  • Systems Management
    • Patches for 11g OEM agents
    • Management plugins for 11g OEM
    • Patches for 11g OEM management server

No word on when Oracle will start including patches for the new OEM 12c.  Keep in mind that these are just a collection of patches, they all still need to be installed as if they were downloaded separately.  Oracle does not yet have a mechanism in place to apply the QDPE, storage server patches, Infiniband switch patches, etc in one swoop.

The current QDPE patch is January 2012 (patch #13513783), and the current DFSDP is January 2012 (patch #13551280).

6Jan/126

Voting Disk Redundancy in ASM

A recent discussion thread on the OTN Exadata forum made me want to test a feature of 11.2 RAC - voting disk redundancy.  There was one section of the Clusterware Adminsitration and Deployment Guide (http://goo.gl/eMrQM) that made me want to test it out to see how well it worked:

If voting disks are stored on Oracle ASM with normal or high redundancy, and the storage hardware in one failure group suffers a failure, then if there is another disk available in a disk group in an unaffected failure group, Oracle ASM recovers the voting disk in the unaffected failure group.

 
How resilient is it?  How quickly will the cluster recover the voting disk?  Do we have to wait for the ASM rebalance timer to tick down to zero before the voting disk gets recreated?

First, what is a voting disk?  The voting disks are used to determine which nodes are members of the cluster.  In a RAC configuration, there are either 1, 3, or 5 voting disks for the cluster.

Next, when voting disks are placed into an ASM diskgroup, which is the default in 11.2, the number of voting disks that are created depend on the redundancy level of the diskgroup.  For external redundancy, 1 voting disk is created.  When running ASM redundancy, voting disks have a higher requirement for the number of disks.  Normal redundancy creates 3 voting disks, and high redundancy creates 5.  Oracle recommends at least 300MB per voting disk file, and 300MB for each copy of the OCR.  On standard (non-Exadata) RAC builds, I prefer to place OCR and Voting disks into their own diskgroup, named GRID or SYSTEM.

Anyway, back to the fun stuff.  I'm testing this on Exadata, but the results should carry over to any other system running 11.2 RAC.  For starters, we have a diskgroup named DBFS_DG that is used to hold the OCR and voting disks.  We can see that by running the lsdg command in asmcmd. I've abbreviated the output for clarity.

[enkdb03:oracle:+ASM1] /u01/app/11.2.0.2/grid
> asmcmd
ASMCMD> lsdg
State    Type    Rebal  Total_MB   Free_MB  Usable_file_MB  Offline_disks  Voting_files  Name
MOUNTED  NORMAL  N      65261568  49688724        21877927              0             N  DATA/
MOUNTED  NORMAL  N       1192320    978292          434950              0             Y  DBFS_DG/
MOUNTED  HIGH    N      22935248  20570800         5466918              0             N  RECO/
ASMCMD>

The DBFS_DG diskgroup has 4 failgroups - ENKCEL04, ENKCEL05, ENKCEL06, and ENKCEL07:

SYS:+ASM1> select distinct g.name "Diskgroup",
  2    d.failgroup "Failgroup"
  3  from v$asm_diskgroup g,
  4    v$asm_disk d
  5  where g.group_number = d.group_number
  6  and g.NAME = 'DBFS_DG'
  7  /
 
Diskgroup		       Failgroup
------------------------------ ------------------------------
DBFS_DG 		       ENKCEL04
DBFS_DG 		       ENKCEL05
DBFS_DG 		       ENKCEL06
DBFS_DG 		       ENKCEL07
 
SYS:+ASM1>

Finally, our current voting disks reside in failgroups ENKCEL05, ENKCEL06, and ENKCEL07:

[enkdb03:oracle:+ASM1] /home/oracle
> sudo /u01/app/11.2.0.2/grid/bin/crsctl query css votedisk
##  STATE    File Universal Id                File Name Disk group
--  -----    -----------------                --------- ---------
 1. ONLINE   43164b9cc7234fe1bff4eb968ec4a1dc (o/192.168.12.10/DBFS_DG_CD_02_enkcel06) [DBFS_DG]
 2. ONLINE   2e6db5ba5fd34fc8bfaa0ab8b9d0ddf5 (o/192.168.12.11/DBFS_DG_CD_03_enkcel07) [DBFS_DG]
 3. ONLINE   95eb38e5dfc34f3ebfd72127e7fe9c12 (o/192.168.12.9/DBFS_DG_CD_02_enkcel05) [DBFS_DG]
Located 3 voting disk(s).

Going back to the original questions, what does it take for CRS to notice that a voting disk is gone, and how quickly will it be replaced? Are they handled in the same way as normal disks, or some other way? When an ASM disk goes offline, ASM waits the amount of time listed in the disk_repair_time attribute for the diskgroup (default is 3.6 hours) before dropping the disk and performing a rebalance. Let's offline one of the failgroups that has a voting disk and find out.

SYS:+ASM1> !date
Fri Jan  6 12:57:18 CST 2012
 
SYS:+ASM1> alter diskgroup dbfs_dg offline disks in failgroup ENKCEL05;
 
Diskgroup altered.
 
SYS:+ASM1> !date
Fri Jan  6 12:57:46 CST 2012
 
SYS:+ASM1> !sudo /u01/app/11.2.0.2/grid/bin/crsctl query css votedisk
[sudo] password for oracle:
##  STATE    File Universal Id                File Name Disk group
--  -----    -----------------                --------- ---------
 1. ONLINE   43164b9cc7234fe1bff4eb968ec4a1dc (o/192.168.12.10/DBFS_DG_CD_02_enkcel06) [DBFS_DG]
 2. ONLINE   2e6db5ba5fd34fc8bfaa0ab8b9d0ddf5 (o/192.168.12.11/DBFS_DG_CD_03_enkcel07) [DBFS_DG]
 3. ONLINE   ab06b75d54764f79bfd4ba032b317460 (o/192.168.12.8/DBFS_DG_CD_02_enkcel04) [DBFS_DG]
Located 3 voting disk(s).
 
SYS:+ASM1> !date
Fri Jan  6 12:58:03 CST 2012

That was quick! So from this exercise, we can see that ASM doesn't waste any time to recreate the voting disk. With files that are this critical to the stability of the cluster, it's not hard to see why. If we dig further, we can see that ocssd noticed that the voting disk on o/192.168.12.9/DBFS_DG_CD_02_enkcel05 was offline, and it proceeded to create a new file on disk o/192.168.12.8/DBFS_DG_CD_02_enkcel04:

2012-01-06 12:57:37.075
[cssd(10880)]CRS-1605:CSSD voting file is online: o/192.168.12.8/DBFS_DG_CD_02_enkcel04; details in /u01/app/11.2.0.2/grid/log/enkdb03/cssd/ocssd.log.
2012-01-06 12:57:37.075
[cssd(10880)]CRS-1604:CSSD voting file is offline: o/192.168.12.9/DBFS_DG_CD_02_enkcel05; details at (:CSSNM00069:) in /u01/app/11.2.0.2/grid/log/enkdb03/cssd/ocssd.log.

Finally, what happens when the failgroup comes back online? There are no changes made to the voting disks. The voting disk from the failgroup that was previously offline is still there, but will be overwritten if that space is needed:

SYS:+ASM1> !date
Fri Jan  6 13:05:09 CST 2012
 
SYS:+ASM1> alter diskgroup dbfs_dg online disks in failgroup ENKCEL05;
 
Diskgroup altered.
 
SYS:+ASM1> !date
Fri Jan  6 13:06:31 CST 2012
 
SYS:+ASM1> !sudo /u01/app/11.2.0.2/grid/bin/crsctl query css votedisk
[sudo] password for oracle:
##  STATE    File Universal Id                File Name Disk group
--  -----    -----------------                --------- ---------
 1. ONLINE   43164b9cc7234fe1bff4eb968ec4a1dc (o/192.168.12.10/DBFS_DG_CD_02_enkcel06) [DBFS_DG]
 2. ONLINE   2e6db5ba5fd34fc8bfaa0ab8b9d0ddf5 (o/192.168.12.11/DBFS_DG_CD_03_enkcel07) [DBFS_DG]
 3. ONLINE   ab06b75d54764f79bfd4ba032b317460 (o/192.168.12.8/DBFS_DG_CD_02_enkcel04) [DBFS_DG]
Located 3 voting disk(s).

In previous versions, this automation did not exist. If a voting disk went offline, the DBA had to manually create a new voting disk. Keep in mind that this feature will only take effect if there must be another failgroup available to place the voting disk in. If you only have 3 failgroups for your normal redundancy (or 5 for your high redundancy) OCR/voting diskgroup, this automatic recreation will not occur.

Tagged as: , 6 Comments
29Dec/110

Exadata Critical Patch for 11.2.2.3.x through 11.2.2.4.1

Oracle has released a critical patch for storage server versions 11.2.2.3.x through 11.2.2.4.1.  While 11.2.2.4.1 was released last week, there were a few oneoff patches from 11.2.2.4.0 that didn't seem to make it in to the release.  Oracle has since released 11.2.2.4.2 (patch #13513611, supplemental note #1388400.1).  Similar to 11.2.2.4.1, this release looks to patch several outstanding issues.  Here's the list of bugs fixed from the readme for 11.2.2.4.2:

12764521        INFINIBAND DIAG COMMANDS (LIKE IBDIAGNET AND IBNETDISCOVER) ARE NOT WORKING
13083530        10 GB-E BONDED INTERFACES FAILING- EXADATA
13410353        AFTER UPGRADE TO 11.2.2.4 INFINIBAND CMDS IBDIAGNET, IBNETDISCOVER NOT WORKING
13489032        CHECKHWNFWPROFILE DOES NOT DETECT FAILED FLASH FDOM
13489445        ORA-600 [OSSMISC:OSSMISC_TIMER] WHEN NTPD DETECTED 6 MILLISECOND TIME DIFFERENCE
13512932        FIX INSTALLED WORKAROUND FOR NTP UPDATE BUG 13489445

As you can see, the previously mentioned bugs have been fixed.  There's another bug that was fixed in 11.2.2.4.1 that could be an issue for anybody running 11.2.2.3.x through 11.2.2.4.0.  This bug (13454147) can remove the flashcache from a cell that has an uptime of 6 months or greater.  Fortunately, Oracle has released a patch that includes these critical issues in the event that you can't quickly upgrade to 11.2.2.4.2 - I wouldn't advise running this version for at least a couple weeks...I always advise clients to wait that long for the early adopters to weed out any major issues.

Applying the critical patch only takes a minute, and doesn't take the storage servers or database instances offline.  After it's done, a restart of cellsrv needs to be scheduled, but that can be done in a rolling fashion.  Read on for an example of applying this patch.  As always, do not apply any patch to a production system before appropriately testing against a non-production system!

20Dec/113

Exadata Diskgroup Planning

As business has picked up since OpenWorld (didn't think that was possible, but that's another story for another day), we have been seeing more customers adopt or seriously look at Exadata as an option for new hardware implementations.  While many will complain that there isn't enough room for customization in the rigid process of configuring an Exadata system, there are still many possibilities to make your Exadata your own, whether it's during the initial configuration phase or shortly thereafter.  Of course, some of these modifications can be difficult to implement after the system is up and running with users logging in.  I'm planning on starting a series of posts regarding a couple of the hot-button topics with regard to Exadata configuration - ASM diskgroup layout (the topic for today), role separated vs standard authentication, and so on.  As these topics have no right answers, I'm more than open to a dialogue where you may disagree.  On to the good stuff!

A Quick Primer - The Exadata Storage Architecture

Ok...so we're looking at Exadata specifically in this post.  In the examples listed below, we'll discuss a quarter rack, since it's the easiest to diagram.  To expand to half or full racks, just adjust the number of cells (7, 14) and disks (84, 168) accordingly.  To see the relationship between the compute nodes (database servers), Infiniband switches, and storage servers refer to figure 1:

Figure 1 - Exadata Infiniband/Storage Connectivity

12Dec/112

Inside the Oracle Database Appliance – Part 2

In part 1 of this series, we took a look inside the ODA to see what the OS was doing.  Here, we'll dig in a little further to the disk and storage architecture with regard to the hardware and ASM.

There have been a lot of questions about the storage layout of the shared disks.  We'll start at the lowest level and make our way to the disks as we move down the ladder.  First, there are 2 dual-ported LSI SAS controllers in each of the system controllers (SCs).  They are each connected to a SAS expander that is located on the system board.  Each of these SAS expanders connect to 12 of the hard disks on the front of the ODA.  The disks are dual-ported SAS, so that each disk is connected to an expander on each of the SCs.  Below is a diagram of the SAS connectivity on the ODA (Note:  all diagrams are collected from public ODA documentation, as well as various ODA-related support notes available on My Oracle Support).

From this, you can see the relationship between the SAS controllers, SAS expanders, and SAS drives on the front end.  If you look at the columns of disks, the first 2 columns are serviced by one expander, while the third and fourth columns are services by the other expander.  What the diagram refers to as "Controller-0" and "Controller-1" are actually the independent SCs in the X4370M2.  What this shows is that you can lose any of the following components in the diagram and your database will continue to run (assuming RAC is in use):

14Nov/111

Exadata 11.2.2.4.0 10GbE Issue Resolved

It appears that Oracle has resolved the issue with the 10GbE drivers that were introduced in version 11.2.2.4.0.  There is an updated note (1376664.1) that includes the patch to fix it. The issue was apparently related to TCP segmentation offloading, and can be fixed by installing the patch found in the note listed above. It does not require a reboot, and is similar to the fix for the IDT switch bug fixed in 11.2.2.4.0. Note that this bug only affected X2-2 systems utilizing 10 gigabit ethernet on the compute nodes. Oracle again recommends installing the 11.2.2.4.0 minimal pack on compute nodes.

After starting the service, users should see the following:

(root)# service disable10gigtso_13083530 start
Skipping igb interface eth0 using driver version 2.1.0-k2-1 - TSO disable unnecessary
Found ixgbe interface eth4 using driver version 2.0.84-k2 - Disabling TSO ... [SUCCESS]
Found ixgbe interface eth5 using driver version 2.0.84-k2 - Disabling TSO ... [SUCCESS]