Inside the Oracle Database Appliance – Part 1

By acolvin | September 21, 2011

We've had a few weeks to play around with the ODA in our office, and I've been able to crack it open and get to into the software and hardware that powers it.

For starters, the system runs a new model of Sun Fire - the X4370 M2. The 4U chassis is basically 2 separate 2U blades (Oracle is calling them system controllers - SCs) that have direct attached storage on the front. Here's a listing of the hardware in each SC:

Sun X4370M2 System Controller Components (2 SCs per X4370M2)
CPU	2x 6-core Intel Xeon X5675 3.06GHz
Memory	96GB 1333MHz DDR3
Network	2x 10GbE (SFP+) PCIe card 4x 1GbE PCIe card 2x 1GbE onboard
Internal Storage	2x 500GB SATA for operating system 1x 4GB USB internal
RAID Controller	2x SAS-2 LSI HBA
Shared Storage	20x 600GB 3.5" SAS 15,000 RPM hard drives 4x 73GB 3.5" SSDs
External Storage	2x external MiniSAS ports
Operating System	Oracle Enterprise Linux 5.5 x86-64

Pictures of a real live ODA after the break.

If you're anything like me, the first thing you wanted to know about the ODA is what's inside? Follow along as we walk through the hardware involved.

What's an SC?

The SC is essentially Oracle's term for the blades that sit inside the Sun Fire X4370 M2.

From this view, the back of the SC is at the bottom. At the top (front), are 2 connections that plug into the chassis of the X4370 M2. The SCs slide out from the back of the chassis. Looking closer at the back of the SC, we have the external connections

From the back, you can see the PCI cards on the left, and the onboard ports on the right. In the middle are the fan modules. On the left, we have 4, gigabit ethernet ports, 2, 10GbE (SFP+) ports, and the 2 external SAS connections. On the right are the dual gigabit ethernet (onboard) ports, the serial and network ports for the ILOM, and your standard VGA and USB ports. Above the onboard ports are the 500GB SATA hard drives used for the operating system.

What's Inside?

As mentioned above, there are 2 Seagate 500GB serial ATA hard drives that are used for the operating system:

Along with the serial ATA drives is a 4GB USB flash stick that can be used to create a bootable rescue installation of Oracle Enterprise Linux. Also, this drive is used for some firmware updates.

As for the disk controllers, each SC has 2 LSI controllers. One is on the internal PCIe slot, and another is on a standard PCIe slot. They are the SAS9211-8i controller.

Operating system

One of the many things that I found interesting on the box was that OEL 5.5 was installed, not one of the newer releases. Also, the server is running the RedHat compatible kernel, and not the Unbreakable Enterprise Kernel (UEK), which is the default on newer releases of OEL 5.

[root@xlnxrda01 ~]# uname -a
Linux xlnxrda01 2.6.18-194.32.1.0.1.el5 #1 SMP Tue Jan 4 16:26:54 EST 2011 x86_64 x86_64 x86_64 GNU/Linux
[root@xlnxrda01 ~]# cat /etc/redhat-release
Red Hat Enterprise Linux Server release 5.5 (Tikanga)

Disk configuration

As for the storage, Oracle has removed one of my biggest peeves with the Exadata storage servers. On Exadata, the operating system resides on 30GB partitions on the first 2 hard disks. Because of this, a 30GB griddisk has to be created on the remaining 10 disks, which becomes the DBFS_DG diskgroup (formerly SYSTEMDG). With Exadata, this diskgroup becomes the location for OCR/voting files, unless DATA or RECO is high redundancy. In that case, DBFS_DG is just wasted space. Anyways, going back to the ODA (that was the point of this post, wasn't it?), this problem is no longer present thanks to the 2 500GB 2.5" SATA drives in the back. These drives utilize software RAID (just like the Exadata storage servers), but don't take advantage of the active/inactive partition scheme:

[root@patty ~]# parted /dev/sda print

Model: ATA SEAGATE ST95001N (scsi)
Disk /dev/sda: 500GB
Sector size (logical/physical): 512B/512B
Partition Table: msdos

Number  Start   End    Size   Type     File system  Flags
 1      32.3kB  107MB  107MB  primary  ext3         boot, raid
 2      107MB   500GB  500GB  primary               raid      

Information: Don't forget to update /etc/fstab, if necessary.             

[root@patty ~]# parted /dev/sdb print

Model: ATA SEAGATE ST95001N (scsi)
Disk /dev/sdb: 500GB
Sector size (logical/physical): 512B/512B
Partition Table: msdos

Number  Start   End    Size   Type     File system  Flags
 1      32.3kB  107MB  107MB  primary  ext3         boot, raid
 2      107MB   500GB  500GB  primary               raid      

Information: Don't forget to update /etc/fstab, if necessary.             

[root@patty ~]# cat /proc/mdstat
Personalities : [raid1]
md0 : active raid1 sdb1[1] sda1[0]
      104320 blocks [2/2] [UU]

md1 : active raid1 sdb2[1] sda2[0]
      488279488 blocks [2/2] [UU]

unused devices:
[root@patty ~]# df -h
Filesystem            Size  Used Avail Use% Mounted on
/dev/mapper/VolGroupSys-LogVolRoot
                       30G  7.4G   21G  27% /
/dev/md0               99M   17M   77M  18% /boot
/dev/mapper/VolGroupSys-LogVolOpt
                       59G  5.8G   50G  11% /opt
/dev/mapper/VolGroupSys-LogVolU01
                       97G  188M   92G   1% /u01
tmpfs                  48G     0   48G   0% /dev/shm

Let's look at the LVM setup a little closer. We have a physical volume with a 465GB volume group (VolGroupSys). From here, we have LogVolRoot (30GB), LogVolOpt (60GB), and LogVolU01 (100GB). That leaves us with more than 250GB free on each SC for either adding new filesystems, growing existing filesystems, or taking LVM snapshots.

[root@patty ~]# pvdisplay
  --- Physical volume ---
  PV Name               /dev/md1
  VG Name               VolGroupSys
  PV Size               465.66 GB / not usable 3.44 MB
  Allocatable           yes
  PE Size (KByte)       32768
  Total PE              14901
  Free PE               8053
  Allocated PE          6848
  PV UUID               Q99xBf-AMdf-so7d-VF8J-cIjB-cHeQ-cbWzSD

[root@patty ~]# vgdisplay
  --- Volume group ---
  VG Name               VolGroupSys
  System ID
  Format                lvm2
  Metadata Areas        1
  Metadata Sequence No  5
  VG Access             read/write
  VG Status             resizable
  MAX LV                0
  Cur LV                4
  Open LV               4
  Max PV                0
  Cur PV                1
  Act PV                1
  VG Size               465.66 GB
  PE Size               32.00 MB
  Total PE              14901
  Alloc PE / Size       6848 / 214.00 GB
  Free  PE / Size       8053 / 251.66 GB
  VG UUID               96cHoA-qhpG-A2hr-tbG1-c1oW-k9lI-MGURFU

[root@patty ~]# lvdisplay
  --- Logical volume ---
  LV Name                /dev/VolGroupSys/LogVolRoot
  VG Name                VolGroupSys
  LV UUID                5qpcEM-aPPA-hGGf-GBBs-nwRi-HUNN-qrsIQp
  LV Write Access        read/write
  LV Status              available
  # open                 1
  LV Size                30.00 GB
  Current LE             960
  Segments               1
  Allocation             inherit
  Read ahead sectors     auto
  - currently set to     256
  Block device           253:0

  --- Logical volume ---
  LV Name                /dev/VolGroupSys/LogVolOpt
  VG Name                VolGroupSys
  LV UUID                Ge110F-s9oB-Eqak-muo0-3yWn-GEuN-klI1Rs
  LV Write Access        read/write
  LV Status              available
  # open                 1
  LV Size                60.00 GB
  Current LE             1920
  Segments               1
  Allocation             inherit
  Read ahead sectors     auto
  - currently set to     256
  Block device           253:1

  --- Logical volume ---
  LV Name                /dev/VolGroupSys/LogVolU01
  VG Name                VolGroupSys
  LV UUID                lDBl2R-ZpX7-QZxJ-t8wI-DKde-kANX-zPf3XP
  LV Write Access        read/write
  LV Status              available
  # open                 1
  LV Size                100.00 GB
  Current LE             3200
  Segments               1
  Allocation             inherit
  Read ahead sectors     auto
  - currently set to     256
  Block device           253:2

  --- Logical volume ---
  LV Name                /dev/VolGroupSys/LogVolSwap
  VG Name                VolGroupSys
  LV UUID                Z2shPb-Dbe6-oVIR-hq2z-nDHs-xfNW-43miCW
  LV Write Access        read/write
  LV Status              available
  # open                 1
  LV Size                24.00 GB
  Current LE             768
  Segments               1
  Allocation             inherit
  Read ahead sectors     auto
  - currently set to     256
  Block device           253:3

Network Configuration

The ODA has 8 (6 GbE and 2 10GbE) physical ethernet ports available to you, along with 2 internal fibre ports that are used for a built-in cluster interconnect. Here's the output of running "ethtool" on the internal NICs:

[root@patty ~]# ethtool eth0
Settings for eth0:
        Supported ports: [ FIBRE ]
        Supported link modes:   1000baseT/Full
        Supports auto-negotiation: Yes
        Advertised link modes:  1000baseT/Full
        Advertised auto-negotiation: Yes
        Speed: 1000Mb/s
        Duplex: Full
        Port: FIBRE
        PHYAD: 0
        Transceiver: external
        Auto-negotiation: on
        Supports Wake-on: d
        Wake-on: d
        Current message level: 0x00000001 (1)
        Link detected: yes
[root@patty ~]# ethtool eth1
Settings for eth1:
        Supported ports: [ FIBRE ]
        Supported link modes:   1000baseT/Full
        Supports auto-negotiation: Yes
        Advertised link modes:  1000baseT/Full
        Advertised auto-negotiation: Yes
        Speed: 1000Mb/s
        Duplex: Full
        Port: FIBRE
        PHYAD: 0
        Transceiver: external
        Auto-negotiation: on
        Supports Wake-on: d
        Wake-on: d
        Current message level: 0x00000001 (1)
        Link detected: yes

Surprisingly, these NICs aren't bonded, so we have 2 separate cluster interconnects, which means that we also have 2 HAIP devices. Also, eth1 and eth2 (the onboard NICs) were used to create a bond for the public traffic.

[patty:oracle:+ASM1] /home/oracle
> oifcfg getif
eth0  192.168.16.0  global  cluster_interconnect
eth1  192.168.17.0  global  cluster_interconnect
bond0  192.168.8.0  global  public

Note that the default configuration of the ODA doesn't include a management network, like the Exadata does. That doesn't mean that you can't set up a management network, just that it's not part of the initial setup process.

There's a little overview of what's inside the ODA. The next piece in the series will go into a little more detail on the disks, as well as the Oracle configuration.

35 thoughts on “Inside the Oracle Database Appliance – Part 1”

John Slater September 21, 2011

X4370M2? That’s just silly when there was no M1.

Reply ↓
Andy Colvin Post authorSeptember 21, 2011

Don’t put it past Oracle….they started the database product with version 2.

Reply ↓
1. Roy “Liwanu” Olsen October 28, 2014
  
  And OAK just jumped from 2.10 to 12.1 ^^
  
  I suppose that, while sporting the same lack of correctness, Oracle’s methods appear to outperform Microsoft’s way of counting (2, 3, 386, 3.1, 3.11, 95, 98, 2000, 7, 8, 10)
  
  Reply ↓
Pingback: Kerry Osborne’s Oracle Blog » Blog Archive » Oracle Database Appliance – (Baby Exadata) – Kerry Osborne’s Oracle Blog
Pingback: Oracle Database Appliance (ODA) Installation / Configuration « Karl Arao's Blog
Ben Couldrey September 21, 2011

The would have been a Sun product in development called X4370. Since Oracle’s acquisition of Sun the product has been further developed and thus tagged M2.

Reply ↓
laotsao September 22, 2011

the Nehalem product without M2
all westmere server has M2 tag

Reply ↓
laotsao September 22, 2011

Nic post questions
1)can you share the cluster interconnect cables and Internal card that has two ge and
2)there is 4 more slots open, can one add more SSD?
3)why the name oakcli?
4)what is the function of two UART port?
5)any picture of two SAS HBA connect to both server?
thx

Reply ↓
Andy Colvin Post authorSeptember 22, 2011

@laotsao – The 2 UART ports are used for the internal cluster interconnect. Because the UDA is designed to only be used as a 2-node RAC, they eliminated the need for a cluster interconnect that is cabled. As for adding more SSD, there’s no room…It’s got 24 disk slots, and has 20 hard disks, 4 SSD. oakcli comes from “Oracle Appliance Kit CLI.”

When I’m back in the office, I’ll try to get some more pics of the inside. We’ll see if I can get the guys to let me take one of the nodes down.

Reply ↓
Pingback: Oracle Database Appliance – (Baby Exadata) « Ukrainian Oracle User Group
laotsao September 22, 2011

thx, yes there is no more slots open with 4 SSD and 20 SAS HDD

Reply ↓
Uwe Hesse September 23, 2011

Great post, Andy!
Only one thing: The DBFS_DG is not wasted space if you implement the DBFS database (therefore the new name of the diskgroup) with it’s tablespaces there. That is the recommended way to host flat files (that you may use for SQL*Loader or External Tables) on Exadata.

Reply ↓
Pingback: Oracle Database Appliance, ¿database appliance o database-in-a-box? « avanttic blog
Pingback: Blog: Inside the Oracle Database Appliance –... | Oracle | Syngu
Pingback: DBappliance images | LaoTsao's Weblog (老曹的網路記)
Pingback: Your Questions About Serial Ata Drives
Vin Everett November 1, 2011

Andy are there any gotchas with the shared storage ?
Can you address it from each controller on each SC ? Assuming you want a volume on each SC.

Do you have to take the triple protection on the controller or could you take a two raid 6 volumes at 4800GB each ?

Cheers Vin

Reply ↓
Andy Colvin Post authorNovember 2, 2011

Vin,

The shared storage is configured to be used within ASM diskgroups. By default, there are 3 diskgroups created: DATA, RECO, and REDO. If you want a shared filesystem between the 2 SCs, you will use either external NFS storage, or create a volume using ACFS. It is my understanding that the only protection for the ASM disks is through ASM redundancy, which we have seen to be very resilient. High redundancy is how the box was set up by the configurator, and there was not an option through the GUI to change that. If you are running this in a production environment, I would definitely recommend running with high redundancy. There is no hardware RAID used on the ASM disks. The 500GB disks that are in the back are isolated to each SC.

Reply ↓
Andy Colvin Post authorNovember 13, 2011

One thing that I mentioned above were the 2 external SAS ports. I’ve heard from a couple of people at Oracle that the ODA does not support using these external connections. It sounds like (no official confirmation) that the only supported methods of storage expansion are using NFS (preferably direct NFS) and iSCSI. We’re working on iSCSI in our lab, and it’s not as straightforward as you would expect. Results on that in a future post.

Reply ↓
Pingback: Inside the Oracle Database Appliance – Part 2 « Oracle-Ninja.com
Ahmed January 19, 2012

Hi there ,
Can you please tell me what are ip address requirement for EE single node installation?
Is that 2 ip per ODA is enough for EE single node installation?

Thanks

Ahmed

Reply ↓
Pingback: Small Business Solutions » Database Tuning for Oracle VMware
Andy Colvin Post authorJanuary 23, 2012

Ahmed,

No matter what your configuration is, you will need 8 IP addresses. That includes 2 for the ILOMs (each SC has an ILOM), 2 for the SCs, 2 for VIPs (each SC will have a vip), and 2 for the scan (because the cluster will only have 2 nodes, it only needs 2 IPs for the scan). While it isn’t required to have the ILOMs connected to the network, it is definitely recommended. Also, even if you chose to not run RAC for the ODA, you will still get a clustered grid infrastructure, which will utilize the VIPs and scan. This is included free of charge when you license enterprise edition.

Reply ↓
1. hadar February 9, 2012
  
  Hi,
  There is a requirement that the interconnect will use switched network and not cross cable.
  How is it implemented in ODA ? Do they have internal switch on the somehow changed the concept and are using cross cable
  Hadar
  
  Reply ↓
  1. Andy Colvin Post authorFebruary 9, 2012
    
    They’re not really switched interfaces, but the internal NICs use the onboard Intel 82576 chip. From the RAC FAQ (note #220970.1):
    
    ——————————————————————————————
    Is crossover cable supported as an interconnect with RAC on any platform ?
    
    NO. CROSS OVER CABLES ARE NOT SUPPORTED. The requirement is to use a switch:
    
    Detailed Reasons:
    
    1) cross-cabling limits the expansion of RAC to two nodes
    
    2) cross-cabling is unstable:
    
    a) Some NIC cards do not work properly with it. They are not able to negotiate the DTE/DCE clocking, and will thus not function. These NICS were made cheaper by assuming that the switch was going to have the clock. Unfortunately there is no way to know which NICs do not have that clock.
    
    b) Media sense behaviour on various OS’s (most notably Windows) will bring a NIC down when a cable is disconnected. Either of these issues can lead to cluster instability and lead to ORA-29740 errors (node evictions).
    
    Due to the benefits and stability provided by a switch, and their afforability ($200 for a simple 16 port GigE switch), and the expense and time related to dealing with issues when one does not exist, this is the only supported configuration.
    
    From a purely technology point of view Oracle does not care if the customer uses cross over cable or router or switches to deliver a message. However, we know from experience that a lot of adapters misbehave when used in a crossover configuration and cause a lot of problems for RAC. Hence we have stated on certify that we do not support crossover cables to avoid false bugs and finger pointing amongst the various parties: Oracle, Hardware vendors, Os vendors etc…
    
    ——————————————————————————————
    
    It’s my understanding that Oracle has tested against these chips and has verified that the issues above are not present on this particular chip.
    
    Reply ↓
2. Roy “Liwanu” Olsen October 28, 2014
  
  Um, well. Depending on your configuration you could run with anything from two IP adresses up to dozens or hundreds. The default configuration aims at 6 IP addresses for a DNS based configuration, or 5 addresses if you choose to go without DNS round robin. The ILOM requires 2 ip addresses (and 2 switch ports) if connected to the network, but thankfully this is optional.
  
  One solution-in-a-box design I’ve been working on uses only 3 IP addresses, one for each physical node and one for a virtual router that acts as a gateway for the (mostly) virtual network infrastructure.
  
  As for switch-less network configurations, as far as I know it was never a matter of Oracle not supporting RAC clusters with interconnect on crossover cables, just that they would not certify such an implementation. The difference is significant.
  
  With the ODA, Oracle have changed their views on a number of things, including redo multiplexing, crossover cables and a couple of physical laws. ODA does use crossover copper cables (although, these days such cables are not actually crossed) or twinax cables with integrated SFP+ interfaces if you prefer to use the copper ports for the public interfaces.
  
  At least they are still consistently inconsistent…
  
  Reply ↓
Ali Khan April 22, 2012

Dear Folks,

I have requirement to install Active/Passive on ODA,
Kindly send me the step by step to install Active/Passive configuration.

Reply ↓
1. Andy Colvin Post authorApril 23, 2012
  
  Are you talking about RAC one node? When running the deployment, simply choose the advanced option, and you can choose between RAC, RAC one node (active/passive), and Enterprise Edition (only single-instance databases).
  
  Reply ↓
winn July 10, 2012

I am looking at the back of our Sun Fire box and i see an unused Ethernet port;

(left side of box)
[X][X][X][0]
^What port is this?

I have eth0 -> eth9

This is the default config:

eth0 192.168.x.x
eth1 192.168.x.x

bond0 – two 1G network interfaces (eth2/eth3) bonded together
bond1 – Two 1G network interfaces (eth4/eth5) bonded together
bond2 – Two 1G network interfaces (eth6/eth7) bonded together
xbond0 – Two 10G network interfaces (eth8/eth9) bonded together

Reply ↓
1. Andy Colvin Post authorAugust 8, 2012
  
  Sorry for the delay. That port is The port that you’re looking at is eth7.
  
  Reply ↓
yuzeriyuzeri January 17, 2013

Hai, I’ve already installed ODA 2.3 but my client need the database at least 6TB space. How to determine double mirror and triple mirror

Reply ↓
1. Andy Colvin Post authorJanuary 21, 2013
  
  Unfortunately, there is no way to change from high redundancy to normal. It requires a new deployment using version 2.4 of the Oracle Appliance Kit.
  
  Reply ↓
  1. Roy “Liwanu” Olsen October 28, 2014
    
    Do you know if upgrading to OAK 12.1.x will let you change the redundancy configuration as it migrates to full ACFS?
    
    Reply ↓
nasmel March 20, 2013

is it possoble to run Oracle VM environment on ODA x4370

Reply ↓
1. Roy “Liwanu” Olsen October 28, 2014
  
  Yes, nasmel. Oracle Database Appliance Virtualized Platform is supported on the original ODA. Although, depending on the use case, the low amount of memory (when compared to ODA X3-2, X4-2 and the X5-2) may be a limiting factor.
  
  Reply ↓