Having done a handful of Exadata implementations, there's always been one piece of the configuration that's bothered me more than anything else. In the process of ordering an Exadata, Oracle sends the customer a "Configuration Worksheet" that asks questions about how the system should be configured. It's standard stuff: hostnames, DNS and NTP servers, UID and GID for the oracle/dba/oinstall (that's another sore spot) accounts, and IP addresses for the various interfaces. The worksheet comes as a nifty PDF that the customer can modify to suit the needs of the Exadata system.
Unfortunately, the PDF does not allow the customer to modify the IP range used for the IB network. The only option from this form is to use the network 192.168.8.0/22 with the hosts using 192.168.10.1 - 192.168.10.22 (for a full rack). Why the /22 you might ask? Oracle recommends using a subnet of 255.255.252.0 so that multiple Exadata systems can be connected, along with an Exalogic, and whatever other products they have down the line that will connect with Exadata on the IB network. It would be nice if Oracle would allow customers to define this network range themselves, instead of sticking everybody in the 192.168.8.0/22 network. Some say that it won't be a problem, because the interconnect is non-routable, but I disagree. Find out why after the jump
The problem is that if that subnet is used anywhere else in the enterprise, those systems will not be able to connect to the Exadata at all. If the Exadata nodes have a route that sends 192.168.8.0/22 to the IB network, how will it respond to packets coming from a valid host using one of those IPs? The Exadata will never be able to respond, since the routing tables tell the Exadata host to send those packets to bondib0.
For example, say an Exadata has been configured to use 192.168.12.0/24 for its IB network. The client access and management networks are 192.168.10.0/24 and 192.168.11.0/24 respectively. Say that I create a new network outside of the IB switch network that uses 192.168.12.0/24 and create the associated routes to allow this network to talk to 192.168.10.0 and 192.168.11.0. Gateway for all networks is .1. If I have a host on the 192.168.12.0/24 (ethernet) network, it can access anything on 192.168.10.0 and 192.168.11.0 except any Exadata hosts.
I have my Macbook connected to a network with the address 192.168.12.100. 192.168.10.15 is a (non-Exadata) host on the 192.168.10.0 network, while enkdb01 is one of our Exadata compute nodes.
Andy-Colvins-Macbook:~ acolvin$ ifconfig en0 en0: flags=8963<up,broadcast,smart,running,promisc,simplex,multicast> mtu 1500 ether 00:1f:f3:59:6f:ac inet6 fe80::21f:f3ff:fe59:6fac%en0 prefixlen 64 scopeid 0x5 inet 192.168.12.100 netmask 0xffffff00 broadcast 192.168.12.255 media: autoselect (100baseTX <full-duplex,flow-control>) status: active Andy-Colvins-Macbook:~ acolvin$ ping -c 5 192.168.10.15 PING 192.168.10.15 (192.168.10.15): 56 data bytes 64 bytes from 192.168.10.15: icmp_seq=0 ttl=62 time=1.139 ms 64 bytes from 192.168.10.15: icmp_seq=1 ttl=62 time=1.185 ms 64 bytes from 192.168.10.15: icmp_seq=2 ttl=62 time=1.062 ms 64 bytes from 192.168.10.15: icmp_seq=3 ttl=62 time=1.082 ms 64 bytes from 192.168.10.15: icmp_seq=4 ttl=62 time=1.146 ms --- 192.168.10.15 ping statistics --- 5 packets transmitted, 5 packets received, 0.0% packet loss round-trip min/avg/max/stddev = 1.062/1.123/1.185/0.045 ms Andy-Colvins-Macbook:~ acolvin$ traceroute -n -m 3 192.168.10.15 traceroute to 192.168.10.15 (192.168.10.15), 3 hops max, 52 byte packets 1 192.168.12.1 0.867 ms 0.438 ms 0.482 ms 2 192.168.10.15 1.085 ms 0.904 ms 0.773 ms Andy-Colvins-Macbook:~ acolvin$ ping -c 5 enkdb01 PING enkdb01.enkitec.com (192.168.8.201): 56 data bytes Request timeout for icmp_seq 0 Request timeout for icmp_seq 1 Request timeout for icmp_seq 2 Request timeout for icmp_seq 3 --- enkdb01.enkitec.com ping statistics --- 5 packets transmitted, 0 packets received, 100.0% packet loss Andy-Colvins-Macbook:~ acolvin$ traceroute -n -m 3 enkdb01 traceroute to enkdb01.enkitec.com (192.168.8.201), 3 hops max, 52 byte packets 1 192.168.12.1 0.738 ms 0.559 ms 0.353 ms 2 * * * 3 * * * </full-duplex,flow-control></up,broadcast,smart,running,promisc,simplex,multicast>
As you can see, we are able to ping 192.168.10.15, but can't ping enkdb01. What's more interesting is to see what's going on inside enkdb01:
[acolvin@enkdb01 ~]$ /sbin/route -v Kernel IP routing table Destination Gateway Genmask Flags Metric Ref Use Iface 192.168.12.0 * 255.255.255.0 U 0 0 0 bond0 192.168.8.0 * 255.255.252.0 U 0 0 0 eth0 169.254.0.0 * 255.255.0.0 U 0 0 0 bond0 default router.enkitec. 0.0.0.0 UG 0 0 0 eth0 [acolvin@enkdb01 ~]$ sudo /usr/sbin/tcpdump -i eth0 | grep "192.168.12" tcpdump: verbose output suppressed, use -v or -vv for full protocol decode listening on eth0, link-type EN10MB (Ethernet), capture size 96 bytes 07:41:22.359865 IP 192.168.12.100 > enkdb01.enkitec.com: ICMP echo request, id 32399, seq 0, length 64 07:41:23.359831 IP 192.168.12.100 > enkdb01.enkitec.com: ICMP echo request, id 32399, seq 1, length 64 07:41:24.360013 IP 192.168.12.100 > enkdb01.enkitec.com: ICMP echo request, id 32399, seq 2, length 64 07:41:25.360139 IP 192.168.12.100 > enkdb01.enkitec.com: ICMP echo request, id 32399, seq 3, length 64 07:41:26.360343 IP 192.168.12.100 > enkdb01.enkitec.com: ICMP echo request, id 32399, seq 4, length 64 244 packets captured 244 packets received by filter 0 packets dropped by kernel
Packets are coming in, but not going anywhere. If we look at the Infiniband interface, you can see what it's trying to do:
[acolvin@enkdb01 ~]$ sudo /usr/sbin/tcpdump -i bond0 | grep "192.168.12.100" tcpdump: WARNING: arptype 32 not supported by libpcap - falling back to cooked socket tcpdump: verbose output suppressed, use -v or -vv for full protocol decode listening on bond0, link-type LINUX_SLL (Linux cooked), capture size 96 bytes 07:41:22.361029 arp who-has 192.168.12.100 tell enkdb01.enkitec.com hardware #32 07:41:23.361060 arp who-has 192.168.12.100 tell enkdb01.enkitec.com hardware #32 07:41:24.361125 arp who-has 192.168.12.100 tell enkdb01.enkitec.com hardware #32 07:41:26.361099 arp who-has 192.168.12.100 tell enkdb01.enkitec.com hardware #32 07:41:27.361208 arp who-has 192.168.12.100 tell enkdb01.enkitec.com hardware #32 07:41:28.361182 arp who-has 192.168.12.100 tell enkdb01.enkitec.com hardware #32 3089 packets captured 3089 packets received by filter 0 packets dropped by kernel
Just as we expected, enkdb01 is looking at the routing table and sending ARP requests over bond0 (the Infiniband interface) to find out what MAC has 192.168.12.100. If you've worked on a RAC system before, it should not surprise you that the interconnect IPs need to be separate. Oracle's checkip scripts that look for potential network issues do not even run any checks against the Infiniband network. Moral of the story is that you shouldn't gloss over the Infiniband network even though it's "non-routable."