Oracle-Ninja.com Andy Colvin's Oracle Blog

1Oct/114

Looking Back on 60+ Exadata Implementations

After seeing this press release, I couldn't help but think back on the last year and a half that I've been working on Exadata, and all of the interesting projects and implementations we've worked on.  When you think about the number of Exadata systems that are out there (Oracle claims over 1,000), and we at Enkitec have sold - 29 - it's pretty impressive (75% of all Exadata systems in North America not sold by Oracle were sold by Enkitec), at least to me.

Going back over a few of them, we've worked with the following packaged applications:

  • eBusiness Suite
  • PeopleSoft
  • OBIEE
  • Informatica
  • Oracle Warehouse Builder

Not to mention a number of custom applications based around code that was developed in house.  There have been OLTP, data warehouse, and mixed load environments.  We've moved 9.2 databases into Exadata using export/import, 11.2 databases using RMAN, and more than a few live migrations/upgrades using golden gate.

One of the first Exadata systems we worked on was our own, back when information was limited (if you think it's hard to get info today, imagine what it was like when there weren't many out there).  We had no help going through the configuration worksheets.  I'll always remember when looking it over and saying "You mean I need HOW many IPs for a quarter rack?!?!"  From there, we learned about the system from building ours from the ground up.  We chose not to purchase the Oracle installation service, and through a couple of "learning experiences" we picked up quite a few valuable skills on the internals and core of Exadata.  Without having our own box to break and fix, we wouldn't have learned what we did.  We ran through the quater rack to half rack upgrade, and learned the hard way that without labels for the cables, your upgrade isn't going to get very far.

From there, we started with a few engagements as Exadata started to take root in the Dallas area.  We took on a project with a customer that had 2 half rack systems and wanted one of them split into 2 quarter racks.  I even got to do a weekend-long patch-a-thon on a V2 system that was tabbed by Oracle as the "Exadata Basic" system that had 1 database server, 1 storage cell, and 1 infiniband switch.  That was a really interesting process and setup.  We had another client that was running on a maxed out T3 SPARC system, and needed to get off of it badly.  Their database was dying a slow death as the number of active sessions hogged the CPUs until there weren't any more resources left.  We quickly moved them over to an M5000 while we worked out a path to move them from 10.2 on SPARC to Exadata with limited downtime.  We used golden gate to keep the M5 and Exadata databases in sync, then cut over once things were ready to go.

We took on clients needing to consolidate massive numbers of databases from various architectures and versions all onto one Exadata frame.  One client migrated and consolidated 30 databases onto 2 quarter rack systems...all with the help of smart scans, and good resource management.  We performed a few more split rack configurations along the way to help customers save on power costs, as buying 2 half racks wasn't feasible when looking at leasing costs for floor space in the datacenter.

2 of the more interesting implementations were more recent.  One was a migration from a Sun e20k to an X2-8.  The design included migrating a heavily transactional OLTP system with a separate data warehouse.  In the past, they were unable to get both databases running on the same host, as one would completely overrun the other.  We were able to combine the databases (~25TB) into one database and migrate them using golden gate, minimizing the cutover window to a couple of hours (mostly for application reconfiguration).  Now that they're live on the X2-8, they're able to run reports that would never finish before.  Processes that took hours now run in a matter of minutes.  Full backups that took 48 hours to run now finish in under 10 hours.  It's really cool to see the power of the system once you get it up and running.

The other interesting implementation was something you don't see very often.  Exadata without RAC.  I know, you probably wouldn't expect it, but it is possible (and supported) to run Exadata without RAC.  From this standpoint, it becomes more of an HA, consolidation type of system.  I'll have more on this in a future post, but basically, you create a clustered grid infrastructure (which means one set of ASM diskgroups if you so desire), and run single instance databases.  That was definitely one of the coolest installs we've done, just because it's so unique.

All this to say - we've seen quite a bit of Exadata this past year or two, and I can't wait to see what's in store for the future.  I'm sure that at some point, we'll see somebody running an Exadata on Solaris, a SPARC supercluster or two, and who knows what else Oracle is going to announce in the near future.  Here's to another 60 implementations and beyond!

28Sep/111

What’s New With Exadata – September 2011

Over the past few weeks, I've been working on some new (and older) installations of Exadata, and came across a few items that piqued my interest.  Each of these things had been on my mind for a while, but it's nice to see them finally resolved.

The first is a small change to the installation tree of the Oracle homes on Exadata.  With the release of 11.2.0.2, Oracle created a new "best practice" of performing all patch sets out of place into a new home.  While this makes it really easy to roll back a patch, the default naming convention for Oracle homes on Exadata became a bit of a sticky situation. If your 11.2.0.2 Grid Infrastructure home was at /u01/app/11.2.0/grid, where would you put your 11.2.0.3 home when it's ready to come out?  This was the topic of more than a few discussions around the Enkitec office.  Do you extend the version out another digit to 11.2.0.3, or version the home (/u01/app/11.2.0/grid_11.2.0.2, etc).  Well, Oracle has put this discussion to rest....Your new Oracle home directories on Exadata are:

Grid Infrastructure - /u01/app/11.2.0.2/grid
Database - /u01/app/oracle/product/11.2.0.2/dbhome_1

read on for another change (it has to do with bundle patches)

21Sep/1117

Inside the Oracle Database Appliance – Part 1

We've had a few weeks to play around with the ODA in our office, and I've been able to crack it open and get to into the software and hardware that powers it.

For starters, the system runs a new model of Sun Fire - the X4370 M2.  The 4U chassis is basically 2 separate 2U blades (Oracle is calling them system controllers - SCs) that have direct attached storage on the front.  Here's a listing of the hardware in each SC:

Sun X4370M2 System Controller Components
(2 SCs per X4370M2)
CPU 2x 6-core Intel Xeon X5675 3.06GHz
Memory 96GB 1333MHz DDR3
Network 2x 10GbE (SFP+) PCIe card
4x 1GbE PCIe card
2x 1GbE onboard
Internal Storage 2x 500GB SATA for operating system
1x 4GB USB internal
RAID Controller 2x SAS-2 LSI HBA
Shared Storage 20x 600GB 3.5" SAS 15,000 RPM hard drives
4x 73GB 3.5" SSDs
External Storage 2x external MiniSAS ports
Operating System Oracle Enterprise Linux 5.5 x86-64

Pictures of a real live ODA after the break.

21Sep/110

Oracle Announces Oracle Database Appliance

Oracle has announced a new product, the "Oracle Database Appliance," or ODA (pronounced oh-duh) as I like to call it.  Enkitec has been fortunate enough to get our hands on a test box.  Be sure to check out my post on a deep dive (LINK GOES HERE) inside the ODA.

The gist of the ODA is that it's a small RAC (though RAC isn't required) in a box.  Contrary to the rumors, it's not a "mini-Exadata" as some people have speculated.  As you would expect, there's no capability for smart scans.  The ODA does build on one of Exadata's big advantages, the rapid installation time.  Compared to a typical Oracle installation, there is so much time lost in the process of getting a server ready for an Oracle database.  On typical installations, the following things have to be done before a system is ready:

  • racking and cabling the system to power and network
  • connecting servers to the SAN
  • allocating LUNs on the SAN
  • installing the operating system
  • configuring the operating system for Oracle database use (kernel and memory settings)
  • mounting LUNs from the SAN and ensuring multipathing

With the ODA, you only have to perform the first task.  Everything else is taken care of.  The OS is installed and optimized, storage connected, and multipathing configured.  It may not sound like much, but how many projects have you seen delayed because the SAN switch wasn't zoned correctly, etc?

While many people will say that this machine doesn't appeal to a mass market, there are plenty of Oracle shops that could use a system with 12-24 cores and 4TB of usable space.  It's not built to be a data warehouse or OLTP beast...just a really solid machine with plenty of redundancy running an Oracle database.

15Sep/111

Openworld 2011 Presentation – Sizing the FRA

I'll be presenting at OpenWorld this year with Cristobal Pedregal-Martin from Oracle. Our session is titled "How to Best Configure, Size, and Monitor the Oracle Database Fast Recovery Area." While it may be an afterthought for many DBAs, it is something that requires some planning, especially on Exadata environments. Cris will be speaking on guidelines for sizing and maintaining the FRA, while I'll be adding nuggets of wisdom based on my experience in the field. It should be a good experience all around. We're session number 13445, Moscone South 304, Thursday at 3:00. Plan accordingly, as I'm sure it will be a packed house.

The abstract of our talk is:

"The Oracle Database fast recovery area (FRA) provides storage and automated space management for recovery-related files and is a key piece of your high-availability strategy. This session covers best practices for configuring, sizing, and monitoring the FRA. It explains how your choice of logs and backups managed by the FRA affects database availability and discusses how to size the FRA to satisfy your recovery requirements, including those addressed by Oracle Flashback. It also explores how the FRA uses and recycles storage space to enable you to better estimate, define, and monitor your recovery retention policies and flashback windows. Finally, the session presents some common data protection scenarios and discusses how to configure the FRA in each for best results."

14Sep/112

Exadata Storage on Demand

One of the common refrains regarding Exadata storage is that there's no real capacity for adding storage as your database grows.  The routine was always to let the storage guys dole out storage as needed, keeping tight reins on where their precious gigabytes (now terabytes) went.  When a database outgrew the storage it was allocated, a new LUN was requested, and after much gnashing of teeth, it was given to the systems group to present to the database.

Just like many things with Exadata, this process is turned on its head.  The standard Exadata way is to give all of the storage to ASM, and allow the DBAs to make sure that they don't run around drunk off of the amount of raw storage available.  But what if you're like most environments, where you're going to grow into your storage requirements over time?  What many people won't tell you is that you don't necessarily have to license every component on an Exadata simply because it's available for purchase (more on that in a future post).

Say that you're in the market for an Exadata, and while a half rack may suit your needs today, in 12 or 18 months, you'll be needing the space provided by a full rack.  While it is available to purchase an upgrade,  remember that you will be given whatever Oracle's current Exadata hardware is at that time.  If you originally purchased a V2 last year, and Oracle is only offering X2-2 (or whatever gets announced at OpenWorld) components, you will end up with dissimilar compute and storage nodes.  Certain processes like decryption (due to the hardware assist on decryption available in the X2) will perform better on the X2 storage cells vs the V2 storage cells, which leads to sporadic performance.  If you need to have consistent hardware across the rack, but don't have a need for all of it from day one (for either logistical or financial reasons), it is possible to license only what you need.  Granted, you will have to pay for all of the hardware up front, but the support and licensing costs are only paid for when you actually use the features.  Some people may balk at this approach, but it's essentially what storage administrators have been doing for years.  This is just storage that's isolated to a particular system, instead of being available to a larger group of systems.

But, what happens when you need to add storage?  Do you have to take an outage to add storage?  Do you need to bounce the cluster?  The answer is that it's pretty simple.  In my case, we were working with a half rack that was only licensed for a 1/4 rack.  That means that we have purchased 7 storage cells, but are only licensing 3.  With the storage server licensing at $120,000 per cell (12 disks at $10,000 per disk), that's a savings of $480,000 in licenses, not to mention the support costs.

The system was originally configured as a half rack, so all of the griddisks were created, and the ASM diskgroups were configured to use 7 storage servers.  To get back to the licensed number of storage servers, we removed cells 5 through 7 one at a time, and performed a rebalance in between.  The easiest way to do this was to set the DISK_REPAIR_TIME attribute for each diskgroup to 1 minute through sqlplus:

SYS:+ASM1>select g.name "Diskgroup", a.name "Attribute", a.value "Value" from v$asm_attribute a, v$asm_diskgroup g  where a.group_number=g.group_number and a.name='disk_repair_time' order by 1;
 
Diskgroup                      Attribute            Value
------------------------------ -------------------- ------------------------------
DATA_MOS1                      disk_repair_time     3.6h
DBFS_DG                        disk_repair_time     3.6h
RECO_MOS1                      disk_repair_time     3.6h
 
SYS:+ASM1>alter diskgroup DBFS_DG set attribute 'disk_repair_time'='1m';
 
Diskgroup altered.
 
SYS:+ASM1>alter diskgroup RECO_MOS1 set attribute 'disk_repair_time'='1m';
 
Diskgroup altered.
 
SYS:+ASM1>alter diskgroup DATA_MOS1 set attribute 'disk_repair_time'='1m';
 
Diskgroup altered.
 
SYS:+ASM1> select g.name "Diskgroup", a.name "Attribute", a.value "Value" from v$asm_attribute a, v$asm_diskgroup g  where a.group_number=g.group_number and a.name='disk_repair_time' order by 1;
 
Diskgroup                      Attribute            Value
------------------------------ -------------------- ------------------------------
DATA_MOS1                      disk_repair_time     1m
DBFS_DG                        disk_repair_time     1m
RECO_MOS1                      disk_repair_time     1m

By doing this, we ASM will dismount the disks and rebalance the diskgroup after a disk has been offline for 1 minute.  Setting the value this low is only to be used during the process of dropping the unlicensed storage servers from the grid.  After we have dropped them, the value will be reset to the default value of 3.6 hours.  Now, we can shut off one of the storage cells.  After ASM has noticed that the disks are no longer available, the disks are dismounted and a rebalance is started.  When the rebalance is complete, the process is repeated until we are down to the licensed number of cells.  After the storage servers have been removed from ASM, the rebalance timer is set back to default, and the /etc/oracle/cell/network-config/cellip.ora file on each compute node is modified to only search for the storage cells that are licensed.  While this isn't required, it will prevent ASM from querying the cells that aren't being used for Exadata storage, so the total disk discovery time will be shorter, as it's not waiting for the unused cells to time out.

[acolvin@enkdb01 ~]$ cat /etc/oracle/cell/network-config/cellip.ora 
cell="192.168.12.5"
cell="192.168.12.6"
cell="192.168.12.7"
#cell="192.168.12.8"
#cell="192.168.12.9"
#cell="192.168.12.10"
#cell="192.168.12.11"

This is all fairly routine (boring) stuff.  The good part is what happens when we need to add capacity.  Say that something in the database has changed, and you need more space quickly.  You don't have to wait to order a single storage cell, price out an expansion rack, or go through the process of ordering and installing an upgrade kit. Simply log in to each compute node and uncomment the line in /etc/oracle/cell/network-config/cellip.ora that relates to the storage cell you're powering on, then boot up the cell. There is no need to bounce CRS to get the new value in the cellip.ora file to take. Once the cell has booted up and cellsrv is running, ASM will take over and notice the disks are available, add them to the relevant diskgroups, and start a rebalance to get the data moved over. You'll see the following lines in the alert log for ASM:

Tue Sep 13 18:36:09 2011
ALTER SYSTEM SET local_listener='(DESCRIPTION=(ADDRESS_LIST=(ADDRESS=(PROTOCOL=TCP)(HOST=X.X.X.X)(PORT=1521))))' SCOPE=MEMORY SID='+ASM2';
Tue Sep 13 18:52:47 2011
Starting background process XDWK
Tue Sep 13 18:52:48 2011
XDWK started with pid=29, OS id=10912 
Tue Sep 13 18:52:50 2011
NOTE: disk validation pending for group 2/0xe224a1b (DBFS_DG)
SUCCESS: validated disks for 2/0xe224a1b (DBFS_DG)
NOTE: disk validation pending for group 2/0xe224a1b (DBFS_DG)
NOTE: Assigning number (2,30) to disk (o/192.168.10.11/DBFS_DG_CD_07_mos1cel07)
NOTE: Assigning number (2,31) to disk (o/192.168.10.11/DBFS_DG_CD_09_mos1cel07)
NOTE: Assigning number (2,32) to disk (o/192.168.10.11/DBFS_DG_CD_05_mos1cel07)
NOTE: Assigning number (2,33) to disk (o/192.168.10.11/DBFS_DG_CD_10_mos1cel07)
NOTE: Assigning number (2,34) to disk (o/192.168.10.11/DBFS_DG_CD_04_mos1cel07)
NOTE: Assigning number (2,35) to disk (o/192.168.10.11/DBFS_DG_CD_02_mos1cel07)
NOTE: Assigning number (2,36) to disk (o/192.168.10.11/DBFS_DG_CD_03_mos1cel07)
NOTE: Assigning number (2,37) to disk (o/192.168.10.11/DBFS_DG_CD_11_mos1cel07)
NOTE: Assigning number (2,38) to disk (o/192.168.10.11/DBFS_DG_CD_06_mos1cel07)
NOTE: Assigning number (2,39) to disk (o/192.168.10.11/DBFS_DG_CD_08_mos1cel07)
SUCCESS: validated disks for 2/0xe224a1b (DBFS_DG)
NOTE: membership refresh pending for group 2/0xe224a1b (DBFS_DG)
Tue Sep 13 18:52:56 2011
GMON querying group 2 at 10 for pid 19, osid 29830
NOTE: cache opening disk 30 of grp 2: DBFS_DG_CD_07_MOS1CEL07 path:o/192.168.10.11/DBFS_DG_CD_07_mos1cel07
NOTE: cache opening disk 31 of grp 2: DBFS_DG_CD_09_MOS1CEL07 path:o/192.168.10.11/DBFS_DG_CD_09_mos1cel07
NOTE: cache opening disk 32 of grp 2: DBFS_DG_CD_05_MOS1CEL07 path:o/192.168.10.11/DBFS_DG_CD_05_mos1cel07
NOTE: cache opening disk 33 of grp 2: DBFS_DG_CD_10_MOS1CEL07 path:o/192.168.10.11/DBFS_DG_CD_10_mos1cel07
NOTE: cache opening disk 34 of grp 2: DBFS_DG_CD_04_MOS1CEL07 path:o/192.168.10.11/DBFS_DG_CD_04_mos1cel07
NOTE: cache opening disk 35 of grp 2: DBFS_DG_CD_02_MOS1CEL07 path:o/192.168.10.11/DBFS_DG_CD_02_mos1cel07
NOTE: cache opening disk 36 of grp 2: DBFS_DG_CD_03_MOS1CEL07 path:o/192.168.10.11/DBFS_DG_CD_03_mos1cel07
NOTE: cache opening disk 37 of grp 2: DBFS_DG_CD_11_MOS1CEL07 path:o/192.168.10.11/DBFS_DG_CD_11_mos1cel07
NOTE: cache opening disk 38 of grp 2: DBFS_DG_CD_06_MOS1CEL07 path:o/192.168.10.11/DBFS_DG_CD_06_mos1cel07
NOTE: cache opening disk 39 of grp 2: DBFS_DG_CD_08_MOS1CEL07 path:o/192.168.10.11/DBFS_DG_CD_08_mos1cel07
NOTE: Attempting voting file refresh on diskgroup DBFS_DG
GMON querying group 2 at 11 for pid 19, osid 29830
SUCCESS: refreshed membership for 2/0xe224a1b (DBFS_DG)

After the rebalance is complete, the storage has been added, and everything is ready to go. No downtime needed. Keep in mind that the same processes are in place for adding other storage through purchasing single storage cells, or adding Exadata expansion racks. In these cases, the griddisks will need to be configured to match the existing griddisk sizes.

12Jul/110

Exadata Storage Expansion Rack

If you've run into the problem where you're running out of space on your Exadata system, relief is on the way. Oracle has announced the Exadata Storage Expansion Rack. It comes in 3 handy sizes, just like your Exadata:  Quarter, Half, and Full.  They've taken out the database servers and left the remaining guts inside (KVM, Cisco switch, 3 IB switches for half and full, 2 IB switches for quarter).

The quarter rack has 4 cells, a half has 9, and the full rack includes 18 storage cells.  All come with the "High Capacity" 2TB 7200RPM SAS drives, meaning the full rack comes with 194TB of usable space.  The best thing about this is that you'll be able to add the storage to your existing Exadata without taking anything offline.  Just connect the new Exadata storage to the spine switch on your existing Exadata, and you're ready to configure the cells and add the griddisks to ASM.

While checking the pricelist, I also noticed the inclusion of the Exadata memory expansion kit.  This will allow you to upgrade the RAM on your X2-2 database servers from 96GB to the maximum of 144GB.  No word on whether this has been officially announced yet.

Here's a link to the datasheet as well.

11Jul/112

Exadata Split Rack Configuration

I was involved in an interesting Exadata installation last week. We have a client that wanted a half rack Exadata spit into 2 quarter racks to separate development and production. The best part was that we were doing this from the factory image, without the use of Oracle's ACS. Normally, a standard Exadata installation can be a tricky proposition, but since we've run through the process a few times for other customers, it wasn't too difficult.

We started by running the dbm_configurator spreadsheet and giving it the configuration information for a quarter rack system. We then went through the process again for another quarter rack, with an additional storage cell (a half rack can't be split evenly since there are 7 cells). We decided to give the extra cell to the production cluster since it would be running +DATA in high redundancy. We did have to massage the hostnames and IPs a little bit, since the clusters would be sharing the switches, and the IPs would be intertwined between the 2 clusters.

After this, we ran through the typical installation. The only difference was that we had to run everything twice. Once from db01 (production), and once from db03 (would become db01 in development). What we ended up with were 2 separate clusters that only shared the Infiniband network. We had a fully functional Exadata environment to test patching and software code releases.

If your company is looking at purchasing 2 quarter rack Exadata systems that would go into the same datacenter, I would definitely recommend looking at a split half rack solution. Not only do you get an extra storage server and Infiniband switch (half and full racks include a spine switch, while the quarter rack only has 2 leaf switches), you can get significant savings on floor space and power. The power requirements for single phase power are the same between and quarter and half rack - 4 L6-30 plugs. If you purchased 2 quarter racks, you would need 8 L6-30 plugs in total. Also, since (according to Oracle) you are not allowed to place anything else in the rack, you end up with 2 cabinets that only have a few components in them.

Overall, it was a fun experience to go through the installation from the factory image to client handover. We even had enough time at the end to get a DBFS up and running for the client.

7Jul/110

I’m Speaking at the Exadata Virtual Conference!

I've been asked to participate in the Exadata Virtual Conference organized by Tanel Poder that is being held August 3&4.  I'll be speaking about something near and dear to my heart, Exadata patching!  Patching Exadata can be a scary proposition, considering that one patch touches Exadata storage application code, provides OS and kernel updates, and even flash firmware for the hardware components.  Many customers of ours have noted that they have had issues with patching, or are reluctant to patch all together due to the complexity of the process.  I'm looking to explain the process in plain English and take the fear out of it.  Speaking with me will be the authors of Expert Oracle Exadata.  Tanel will be speaking, along with Kerry Osborne, one of the best Exadata performance guys around, and Randy Johnson, speaking about IORM.  It should be an interesting format, allowing for a direct Q&A after each session.  Early bird pricing is $375 per attendee until July 22.

Exadata Virtual Conference

13Jun/116

Exadata Interconnect Addressing Tips

Having done a handful of Exadata implementations, there's always been one piece of the configuration that's bothered me more than anything else.  In the process of ordering an Exadata, Oracle sends the customer a "Configuration Worksheet" that asks questions about how the system should be configured.  It's standard stuff:  hostnames, DNS and NTP servers, UID and GID for the oracle/dba/oinstall (that's another sore spot) accounts, and IP addresses for the various interfaces.  The worksheet comes as a nifty PDF that the customer can modify to suit the needs of the Exadata system.

Unfortunately, the PDF does not allow the customer to modify the IP range used for the IB network.  The only option from this form is to use the network 192.168.8.0/22 with the hosts using 192.168.10.1 - 192.168.10.22 (for a full rack).  Why the /22 you might ask?  Oracle recommends using a subnet of 255.255.252.0 so that multiple Exadata systems can be connected, along with an Exalogic, and whatever other products they have down the line that will connect with Exadata on the IB network.  It would be nice if Oracle would allow customers to define this network range themselves, instead of sticking everybody in the 192.168.8.0/22 network.  Some say that it won't be a problem, because the interconnect is non-routable, but I disagree. Find out why after the jump