Andy Colvin's Oracle Blog

OJVM and DBA_REGISTRY_SQLPATCH

acolvin — Wed, 09 Feb 2022 02:45:03 +0000

I woke up this morning and did what I do most days – doomscroll through twitter. One of the benefits of being in the US is that there are usually some interesting discussions going on with people I follow in Europe. I saw an interesting question from Martin Berger:

https://twitter.com/martinberger_ch/status/1491024978418233344

I’ve had my fights with datapatch over the years, so I was curious by what I saw. Initially, you may suspect it’s a bug, but I thought about the issue being with the OJVM patch, and how that particular patch is handled by datapatch.

Patching the Oracle database is a two-step process. First, the Oracle binaries are patched by the opatch command. After the Oracle home is patched, the datapatch script must be run on a database to apply any fixes that include post-patch SQL.

The OJVM patch in particular has been a thorn in the side of many DBAs. In database 19c and lower, a mismatch between the Java classes in the database and the files in the home will cause an ORA-29548 error to occur.

Operations performed by datapatch will create rows in the dba_registry_sqlpatch view, which goes back to Martin’s original question. Why was the 19.9 OJVM patch listed twice when querying the view? I believe the hint lies in the data that we see from the ACTION_TIME column.

To confirm, I went to my lab system with the same 19.9 patch set installed:

SQL> select PATCH_ID, PATCH_TYPE, ACTION_TIME, DESCRIPTION, STATUS from dba_registry_sqlpatch;

  PATCH_ID PATCH_TYPE ACTION_TIME                    DESCRIPTION                                                  STATUS
---------- ---------- ------------------------------ ------------------------------------------------------------ -------
  31668882 INTERIM    20-OCT-21 06.27.39.120263 PM   OJVM RELEASE UPDATE: 19.9.0.0.201020 (31668882)              SUCCESS
  31771877 RU         20-OCT-21 06.27.39.117792 PM   Database Release Update : 19.9.0.0.201020 (31771877)         SUCCESS

I applied the 19.13 patches, as shown below:

[oracle@acolvin-dg-1 ~]$ $ORACLE_HOME/OPatch/opatch lspatches
33192694;OJVM RELEASE UPDATE: 19.13.0.0.211019 (33192694)
33208123;OCW RELEASE UPDATE 19.13.0.0.0 (33208123)
33192793;Database Release Update : 19.13.0.0.211019 (33192793)

OPatch succeeded.

Once I finished, I ran datapatch, which completed successfully. I checked the dba_registry_sqlpatch, and I saw the same thing:

SQL> select PATCH_ID, PATCH_TYPE, ACTION_TIME, DESCRIPTION, STATUS from dba_registry_sqlpatch;

  PATCH_ID PATCH_TYPE ACTION_TIME                    DESCRIPTION                                                  STATUS
---------- ---------- ------------------------------ ------------------------------------------------------------ -------
  31668882 INTERIM    20-OCT-21 06.27.39.120263 PM   OJVM RELEASE UPDATE: 19.9.0.0.201020 (31668882)          SUCCESS
  31771877 RU         20-OCT-21 06.27.39.117792 PM   Database Release Update : 19.9.0.0.201020 (31771877)     SUCCESS
  31668882 INTERIM    08-FEB-22 06.00.13.069042 PM   OJVM RELEASE UPDATE: 19.9.0.0.201020 (31668882)          SUCCESS
  33192694 INTERIM    08-FEB-22 06.00.18.341368 PM   OJVM RELEASE UPDATE: 19.13.0.0.211019 (33192694)         SUCCESS
  33192793 RU         08-FEB-22 06.00.18.335540 PM   Database Release Update : 19.13.0.0.211019 (33192793)    SUCCESS

Sure enough, we have 2 entries for the 19.9 OJVM patch. What could be causing this? If you look at the output from datapatch, you can see that the script performs a rollback of the 19.9 OJVM patch before applying the 19.13 version:

[oracle@acolvin-dg-1 OPatch]$ ./datapatch -verbose
SQL Patching tool version 19.13.0.0.0 Production on Tue Feb  8 17:54:18 2022
Copyright (c) 2012, 2021, Oracle.  All rights reserved.

Log file for this invocation: /u01/app/oracle/cfgtoollogs/sqlpatch/sqlpatch_6659_2022_02_08_17_54_18/sqlpatch_invocation.log

Connecting to database...OK
Gathering database info...done
Bootstrapping registry and package to current versions...done
Determining current state...done

Current state of interim SQL patches:
Interim patch 31668882 (OJVM RELEASE UPDATE: 19.9.0.0.201020 (31668882)):
  Binary registry: Not installed
  SQL registry: Applied successfully on 20-OCT-21 06.27.39.120263 PM
Interim patch 33192694 (OJVM RELEASE UPDATE: 19.13.0.0.211019 (33192694)):
  Binary registry: Installed
  SQL registry: Not installed

Current state of release update SQL patches:
  Binary registry:
    19.13.0.0.0 Release_Update 211004165050: Installed
  SQL registry:
    Applied 19.9.0.0.0 Release_Update 200930183249 successfully on 20-OCT-21 06.27.39.117792 PM

Adding patches to installation queue and performing prereq checks...done
Installation queue:
  The following interim patches will be rolled back:
    31668882 (OJVM RELEASE UPDATE: 19.9.0.0.201020 (31668882))
  Patch 33192793 (Database Release Update : 19.13.0.0.211019 (33192793)):
    Apply from 19.9.0.0.0 Release_Update 200930183249 to 19.13.0.0.0 Release_Update 211004165050
  The following interim patches will be applied:
    33192694 (OJVM RELEASE UPDATE: 19.13.0.0.211019 (33192694))

Installing patches...
Patch installation complete.  Total patches installed: 3

Validating logfiles...done
Patch 31668882 rollback: SUCCESS
  logfile: /u01/app/oracle/cfgtoollogs/sqlpatch/31668882/23790068/31668882_rollback_COLVINP_2022Feb08_17_54_43.log (no errors)
Patch 33192793 apply: SUCCESS
  logfile: /u01/app/oracle/cfgtoollogs/sqlpatch/33192793/24462514/33192793_apply_COLVINP_2022Feb08_17_55_28.log (no errors)
Patch 33192694 apply: SUCCESS
  logfile: /u01/app/oracle/cfgtoollogs/sqlpatch/33192694/24421575/33192694_apply_COLVINP_2022Feb08_17_55_28.log (no errors)
SQL Patching tool complete on Tue Feb  8 18:00:18 2022

When checking dba_registry_sqlpatch there is another column, ACTION. This column shows what happened during the operation in euqstion. When I add this column to my query, it all comes in to focus:

SQL> select PATCH_ID, PATCH_TYPE, ACTION, ACTION_TIME, DESCRIPTION, STATUS from dba_registry_sqlpatch;

  PATCH_ID PATCH_TYPE ACTION          ACTION_TIME                    DESCRIPTION                                              STATUS
---------- ---------- --------------- ------------------------------ -------------------------------------------------------  --------
  31668882 INTERIM    APPLY           20-OCT-21 06.27.39.120263 PM   OJVM RELEASE UPDATE: 19.9.0.0.201020 (31668882)          SUCCESS
  31771877 RU         APPLY           20-OCT-21 06.27.39.117792 PM   Database Release Update : 19.9.0.0.201020 (31771877)     SUCCESS
  31668882 INTERIM    ROLLBACK        08-FEB-22 06.00.13.069042 PM   OJVM RELEASE UPDATE: 19.9.0.0.201020 (31668882)          SUCCESS
  33192694 INTERIM    APPLY           08-FEB-22 06.00.18.341368 PM   OJVM RELEASE UPDATE: 19.13.0.0.211019 (33192694)         SUCCESS
  33192793 RU         APPLY           08-FEB-22 06.00.18.335540 PM   Database Release Update : 19.13.0.0.211019 (33192793)    SUCCESS

Going back to Martin’s original question, I believe this is what he is seeing. Checking the ACTION_TIME column in his results shows that the second operation on the 19.9 OJVM patch occurred three seconds before the operation on the 19.11 OJVM patch.

RAC, Data Guard, and Password Files

acolvin — Thu, 15 Apr 2021 17:48:43 +0000

When moving from one system to another, one of my favorite migration methods is using data guard. It has its restrictions – the destination should be the same platform (exceptions noted in MOS note #413484.1), same version unless you’re performing an upgrade, and that a block-for-block copy of the database is acceptable. Utilizing data guard allows for most of the heavy lifting to be performed before the actual cutover, and generally provides an easy backout procedure if there are issues.

I was in the process of creating a standby database on a customer’s shiny new Exadata X8M to get them off of their aging Exadata X5-2, when we hit an issue after enabling the data guard configuration. The database in question began on the X5-2 running 11.2.0.4, then was upgraded to 12.1.0.2, followed by 12.2.0.1, and finally 19c. The upgrades themselves had been rather smooth, thanks to excellent application vendor documentation for recommended patches and parameter settings, as well as the various upgrade nodes found through MOS note #888828.1.

We cloned the database using our normal procedure – copy the password file from the primary to the standby, start the instance in nomount mode, and clone using RMAN. After the database had been cloned, we added the database to the broker configuration, and saw an error.

I have recreated the environment using with new hostnames and database names, to protect client details – in the example here, database ac4dg is on the system to be retired, and ac6dg represents the new platform. When we enabled the data guard broker, we immediately saw issues:

DGMGRL> show configuration;

Configuration - acolvin_dg

  Protection Mode: MaxPerformance
  Members:
  ac4dg - Primary database
    Error: ORA-16778: redo transport error for one or more members

    ac6dg - Physical standby database
      Warning: ORA-16854: apply lag could not be determined

Fast-Start Failover:  Disabled

Configuration Status:
ERROR   (status updated 27 seconds ago)

SQL> select thread#, sequence#, archived, applied from v$archived_log order by 1,2;

   THREAD#  SEQUENCE# ARC APPLIED
---------- ---------- --- ---------
	 1	    7 YES NO
	 1	    8 YES NO
	 1	    9 YES NO
	 1	   10 YES NO
	 1	   11 YES NO
	 1	   12 YES NO
	 1	   13 YES NO
	 1	   14 YES NO

8 rows selected.

Checking the alert log on the primary database, we saw the following logged on instance 2:

2021-04-15T10:11:33.500586-05:00
TT00 (PID:106020): Error 1033 received logging on to the standby
2021-04-15T10:16:38.640301-05:00
TT00 (PID:106020): Error 1033 received logging on to the standby
2021-04-15T10:21:43.804389-05:00
TT00 (PID:106020): Error 1033 received logging on to the standby
2021-04-15T10:26:48.956976-05:00
TT00 (PID:106020): Error 1033 received logging on to the standby
2021-04-15T10:31:54.141234-05:00
TT00 (PID:106020): Error 1033 received logging on to the standby

We also saw issues being logged in the standby alert log on instance 2:

2021-04-14T15:12:08.200276-05:00
PR00 (PID:23854): Error 1017 received logging on to the standby
PR00 (PID:23854): -------------------------------------------------------------------------
PR00 (PID:23854): Check that the source and target databases are using a password file
PR00 (PID:23854): and remote_login_passwordfile is set to SHARED or EXCLUSIVE,
PR00 (PID:23854): and that the SYS password is same in the password files,
PR00 (PID:23854): returning error ORA-16191
PR00 (PID:23854): -------------------------------------------------------------------------
PR00 (PID:23854): FAL: Error 16191 connecting to ac4dg for fetching gap sequence
2021-04-14T15:12:08.211820-05:00
Errors in file /u01/app/oracle/diag/rdbms/ac6dg/ac6dg2/trace/ac6dg2_pr00_23854.trc:
ORA-16191: Primary log shipping client not logged on standby

The logs gave some pretty specific messages, so I began to investigate. My standby creation process was the same as I’d used in the past – copy the password file from the primary to the standby, then after the standby database is cloned, move it to ASM. This was when I remembered that the primary database did not use a shared password file, but a file for each instance located in $ORACLE_HOME/dbs. This database had been originally created before RAC databases supported a shared password file, and the password file had never been updated to a single shared file on ASM.

Sure enough, I compared the password files, and they had a different md5sum between the two nodes. When you’re running RAC, changing the SYS password from sqlplus will only update the password file on the local node. I verified this in my lab by comparing the md5sum of the password files before and after changing the SYS password:

[oracle@enkx4db03c01 ~]$ dcli -l oracle -g dbs_group 'md5sum /u01/app/oracle/product/19.0.0.0/dbhome_1/dbs/orapw*'
enkx4db03c01: 74af939a4301ff6a0be9c2a31a1777c9  /u01/app/oracle/product/19.0.0.0/dbhome_1/dbs/orapwac4dg1
enkx4db04c01: 74af939a4301ff6a0be9c2a31a1777c9  /u01/app/oracle/product/19.0.0.0/dbhome_1/dbs/orapwac4dg2

[oracle@enkx4db03c01 ~]$ dcli -l oracle -g dbs_group 'ls -al /u01/app/oracle/product/19.0.0.0/dbhome_1/dbs/orapw*'
enkx4db03c01: -rw-r----- 1 oracle oinstall 2048 Apr 14 14:00 /u01/app/oracle/product/19.0.0.0/dbhome_1/dbs/orapwac4dg1
enkx4db04c01: -rw-r----- 1 oracle oinstall 2048 Apr 14 14:00 /u01/app/oracle/product/19.0.0.0/dbhome_1/dbs/orapwac4dg2

-- change password --

[oracle@enkx4db03c01 ~]$ dcli -l oracle -g dbs_group 'md5sum /u01/app/oracle/product/19.0.0.0/dbhome_1/dbs/orapw*'
enkx4db03c01: 50ba6684c119fbc25960167813c63bae  /u01/app/oracle/product/19.0.0.0/dbhome_1/dbs/orapwac4dg1
enkx4db04c01: 74af939a4301ff6a0be9c2a31a1777c9  /u01/app/oracle/product/19.0.0.0/dbhome_1/dbs/orapwac4dg2

[oracle@enkx4db03c01 ~]$ dcli -l oracle -g dbs_group 'ls -al /u01/app/oracle/product/19.0.0.0/dbhome_1/dbs/orapw*'
enkx4db03c01: -rw-r----- 1 oracle oinstall 2048 Apr 14 14:11 /u01/app/oracle/product/19.0.0.0/dbhome_1/dbs/orapwac4dg1
enkx4db04c01: -rw-r----- 1 oracle oinstall 2048 Apr 14 14:00 /u01/app/oracle/product/19.0.0.0/dbhome_1/dbs/orapwac4dg2

The first order of business was to get the password file moved to a shared location. We could easily do that with asmcmd’s pwcopy command:

ASMCMD> pwcopy --dbuniquename ac4dg /u01/app/oracle/product/19.0.0.0/dbhome_1/dbs/orapwac4dg1 +DATAC1/ac4dg/orapwac4dg
copying /u01/app/oracle/product/19.0.0.0/dbhome_1/dbs/orapwac4dg1 -> +DATAC1/ac4dg/orapwac4dg

ASMCMD> ls -l +DATAC1/ac4dg/orapwac4dg
Type      Redund  Striped  Time             Sys  Name
PASSWORD  HIGH    COARSE   APR 15 10:00:00  N    orapwac4dg => +DATAC1/ac4dg/PASSWORD/pwdac4dg.351.1069929821

[oracle@enkx4db03c01 ~]$ srvctl config database -db ac4dg | egrep 'unique|Password'
Database unique name: ac4dg
Password file: +DATAC1/ac4dg/orapwac4dg

That part was simple, but would anything else be required? I initially thought we may need to perform a rolling restart of the instances after moving the password file to a shared location, but it was actually much easier – we just needed to bounce the data guard broker on the primary:

SQL> alter system set dg_broker_start=false sid='*' scope=both;

System altered.

SQL> alter system set dg_broker_start=true  sid='*' scope=both;

System altered.

Once we bounced the broker on the primary database, log shipping picked up, and logs were applied.

DGMGRL> show configuration;

Configuration - acolvin_dg

  Protection Mode: MaxPerformance
  Members:
  ac4dg - Primary database
    ac6dg - Physical standby database

Fast-Start Failover:  Disabled

Configuration Status:
SUCCESS   (status updated 0 seconds ago)

SQL> select thread#, sequence#, archived, applied from v$archived_log order by 1,2;

   THREAD#  SEQUENCE# ARC APPLIED
---------- ---------- --- ---------
	 1	    7 YES YES
	 1	    8 YES YES
	 1	    9 YES YES
	 1	   10 YES YES
	 1	   11 YES YES
	 1	   12 YES YES
	 1	   13 YES YES
	 1	   14 YES YES
	 2	    7 YES YES
	 2	    8 YES YES
	 2	    9 YES YES

11 rows selected.

Overall, the lesson that I was able to take away from this was that shared password files are more than just a cool feature in RAC – it should be a necessity. Also, it’s very easy to continue to rely on older functionality and methods after numerous upgrades, but sometimes you’re better off by taking advantage of the new features after an upgrade.

Oracle Cloud Infrastructure – Unusual Activity Announcements

acolvin — Wed, 14 Apr 2021 14:57:21 +0000

When logging in to an OCI tenancy, I noticed something interesting – there was a message bar at the top of the console reporting “Unusual traffic detected.” It turns out that there were a couple of instances in a compartment that were showing signs of potential compromise. Here’s the alert that we received when we clicked on the banner:

It included in the detail the region, instance name, and OCID of the offending resources, as well as the type of activity – in this case, the instances were showing traffic patterns that matched brute-force SSH attacks. This information made it very easy to investigate and remediate. As it turns out, someone had created an instance with a wide open security list in a compartment set to be destroyed. We were able to jump on it quickly and terminate the offending instances.

The warning was a good reminder to keep an eye on your security lists and public instances. All told, this is a very good feature to see in the real world.

SCAN Hostname Resolution and Domain Names

acolvin — Tue, 06 Apr 2021 13:23:13 +0000

I spoke with a customer that mentioned they were moving an Exadata rack from one datacenter to another and they had a few questions about what needed to change in order to complete the move. The customer wanted to make the move as much of a “lift and shift” as possible, avoiding the need to rebuild the cluster upon power up at the new datacenter. This means that the hostnames themselves could not change. We knew that IP addresses would have to change, but the customer also has different DNS domain names for each datacenter. For example, with Exadata racks in Atlanta and Dallas, the SCAN hostnames might be:

exa1-scan.atl.client.com

exa2-scan.dal.client.com

The relocation happened to be on a very tight timeline, and the application team requested that the original domain name be kept after moving the Exadata, since they did not have time to test changing their applications. That raised the question on whether we should change the domain name at all, or just leave this Exadata as the misfit with the wrong domain. I raised the point that they could easily update the DNS records in the old domain to point to the new IPs, and create records in the new domain. Unfortunately, this means 2 sets of DNS records that exist for a period of time, but it takes care of the permanent state as well as keeps the application team running as they are today.

The next question was about how the SCAN would reply to a client that connects with a fully qualified domain name (FQDN) that is using the old domain. I wanted to do a test to see what would happen. As we know, a datbase client first connects to the SCAN listener, and is then directed to the appropriate database instance via a local listener running a on a VIP. The concern of the customer was that an application querying the SCAN with one domain name may receive a result that it can’t resolve, breaking connectivity after the move.

First, it’s worth noting that (on Exadata), the VIP hostnames are stored in the cluster registry as an FQDN, whereas the SCAN is not:

[root@enkx6db01 ~]# srvctl config vip -node enkx6db01
VIP exists: network number 1, hosting node enkx6db01
VIP Name: enkx601-vip.enkitec.local
VIP IPv4 Address: 10.9.238.68
VIP IPv6 Address:
VIP is enabled.
VIP is individually enabled on nodes:
VIP is individually disabled on nodes:

[root@enkx6db01 ~]# srvctl config scan
SCAN name: enkx6-scan, Network: 1
Subnet IPv4: 10.9.238.64/255.255.255.192/bondeth0, static
Subnet IPv6:
SCAN 1 IPv4 VIP: 10.9.238.70
SCAN VIP is enabled.
SCAN 2 IPv4 VIP: 10.9.238.71
SCAN VIP is enabled.
SCAN 3 IPv4 VIP: 10.9.238.72
SCAN VIP is enabled.

I decided to test out the scenario by removing any domain names from my client’s search path, so a non-fully qualified hostname would cause the connection to fail. As I expected, the database client was able to connect successfully. I ran a test to a database named x619 on the enkx6-scan cluster.

That left the question – what exactly does the SCAN return when the client connects? I pulled out my old friend, tcpdump, and ran the following command to capture all packets back and forth between my client and the SCAN listeners:

# tshark -i eth0 -V host 10.9.238.70 or 10.9.238.71 or 10.9.238.72

We ended up with the following frame of data, sent from the SCAN to my client. Output is abbreviated with “…” to clarify when some output has been removed.


Frame 7: 325 bytes on wire (2600 bits), 325 bytes captured (2600 bits) on interface 0
    Interface id: 0
    WTAP_ENCAP: 1
    Arrival Time: Apr  5, 2021 21:22:05.909223555 CDT
...
Internet Protocol Version 4, Src: 10.9.238.70 (10.9.238.70), Dst: 10.9.237.131 (10.9.237.131)
...
Transmission Control Protocol, Src Port: ncube-lm (1521), Dst Port: 46743 (46743), Seq: 11, Ack: 257, Len: 271
    Source port: ncube-lm (1521)
    Destination port: 46743 (46743)
    [Stream index: 0]
    Sequence number: 11    (relative sequence number)
    [Next sequence number: 282    (relative sequence number)]
    Acknowledgment number: 257    (relative ack number)
    Header length: 20 bytes
    Flags: 0x018 (PSH, ACK)
...
Transparent Network Substrate Protocol
    Packet Length: 271
    Packet Checksum: 0x000a
    Packet Type: Data (6)
    Reserved Byte: 00
    Header Checksum: 0x0000
    Data
        Data Flag: 0x0040
            .... .... .... ...0 = Send Token: False
            .... .... .... ..0. = Request Confirmation: False
            .... .... .... .0.. = Confirmation: False
            .... .... .... 0... = Reserved: False
            .... .... ..0. .... = More Data to Come: False
            .... .... .1.. .... = End of File: True
            .... .... 0... .... = Do Immediate Confirmation: False
            .... ...0 .... .... = Request To Send: False
            .... ..0. .... .... = Send NT Trailer: False
        Data (261 bytes)

0000  28 41 44 44 52 45 53 53 3d 28 50 52 4f 54 4f 43   (ADDRESS=(PROTOC
0010  4f 4c 3d 54 43 50 29 28 48 4f 53 54 3d 31 30 2e   OL=TCP)(HOST=10. <- IP ADDR
0020  39 2e 32 33 38 2e 36 38 29 28 50 4f 52 54 3d 31   9.238.68)(PORT=1 <- IP ADDR
0030  35 32 31 29 29 00 28 44 45 53 43 52 49 50 54 49   521)).(DESCRIPTI
0040  4f 4e 3d 28 43 4f 4e 4e 45 43 54 5f 44 41 54 41   ON=(CONNECT_DATA
0050  3d 28 53 45 52 56 45 52 3d 44 45 44 49 43 41 54   =(SERVER=DEDICAT
0060  45 44 29 28 53 45 52 56 49 43 45 5f 4e 41 4d 45   ED)(SERVICE_NAME
0070  3d 78 36 31 39 29 28 43 49 44 3d 28 50 52 4f 47   =x619)(CID=(PROG
0080  52 41 4d 3d 73 71 6c 70 6c 75 73 29 28 48 4f 53   RAM=sqlplus)(HOS
0090  54 3d 65 6e 6b 70 65 78 61 63 68 6b 2e 65 6e 6b   T=enkpclient.enk
00a0  69 74 65 63 2e 6c 6f 63 61 6c 29 28 55 53 45 52   itec.local)(USER
00b0  3d 6f 72 61 63 6c 65 29 29 28 49 4e 53 54 41 4e   =oracle))(INSTAN
00c0  43 45 5f 4e 41 4d 45 3d 78 36 31 39 31 29 29 28   CE_NAME=x6191))( <- INSTANCE
00d0  41 44 44 52 45 53 53 3d 28 50 52 4f 54 4f 43 4f   ADDRESS=(PROTOCO
00e0  4c 3d 54 43 50 29 28 48 4f 53 54 3d 31 30 2e 39   L=TCP)(HOST=10.9
00f0  2e 32 33 38 2e 37 30 29 28 50 4f 52 54 3d 31 35   .238.70)(PORT=15
0100  32 31 29 29 29                                    21)))
            Data: 28414444524553533d2850524f544f434f4c3d5443502928...
            [Length: 261]

Sure enough, the SCAN reported back with the IP address of the VIP. In this case, the customer could easily remove all of the DNS records in the old domain, leaving a CNAME that points the old SCAN hostname to the new one. In the end, they opted to update the DNS records, just in case anyone misses the domain change.

Restarting Autoupgrade Jobs When the Instance Won’t Restart

acolvin — Fri, 13 Nov 2020 15:47:05 +0000

When upgrading databases, my preferred method of late has been the autoupgrade tool. Autoupgrade gives DBAs the ability to upgrade databases in batches, automatically performing prechecks, postchecks, and custom database tasks during the upgrade process.

Autoupgrade can be downloaded from MOS note #2485457.1, and tests shown here used the most current version at the time of writing, version 19.10.0.

I was going through an upgrade of a RAC database from 12.1.0.2. to 19.6.0.0.200114, and hit a strange issue – the database upgrade had completed all prechecks, and failed with the following message:

upg>
-------------------------------------------------
Errors in database [acup1211]
Stage [DBUPGRADE]
Operation [STOPPED]
Status [ERROR]
Info [
Error: UPG-1401
Opening Database acup121 in upgrade mode failed
Cause: Opening database for upgrade in the target home failed
For further details, see the log file located at /u01/app/oracle/autoupgrade/acup1211/acup1211/101/autoupgrade_20201112_user.log]

-------------------------------------------------
Logs: [/u01/app/oracle/autoupgrade/acup1211/acup1211/101/autoupgrade_20201112_user.log]
-------------------------------------------------

As we can see from the error, the database attempted to start in upgrade mode, but failed to open. I checked the database alert log, and didn’t see any errors of note in there. From there, I went to the log that the output told me to check (a novel concept, I know), and the error was in there, clear as day:

2020-11-12 15:11:31.123 ERROR
DATABASE NAME: acup1211
CAUSE: ERROR at Line 1 in [Buffer]
REASON: LRM-00121: '12.1.0.2.1' is not an allowable value for 'optimizer_features_enable'
ACTION: [MANUAL]
DETAILS: 121, 0, "'%.*s' is not an allowable value for '%.*s'"
// *Cause: The value is not a legal value for this parameter.
// *Action: Refer to the manual for allowable values.

It seems as though my problem was due to the optimizer_features_enable parameter being set to 12.1.0.2.1. This is a valid value in 12.1.0.2, but apparently isn’t accepted in 19c (checking from a different database):

[oracle@enkx4db01 ~]$ sqlplus / as sysdba

SQL*Plus: Release 19.0.0.0.0 - Production on Fri Nov 13 09:42:52 2020
Version 19.6.0.0.0

Copyright (c) 1982, 2019, Oracle. All rights reserved.


Connected to:
Oracle Database 19c Enterprise Edition Release 19.0.0.0.0 - Production
Version 19.6.0.0.0

SQL> alter system set optimizer_features_enable='12.1.0.2.1' sid='*' scope=spfile;
alter system set optimizer_features_enable='12.1.0.2.1' sid='*' scope=spfile
*
ERROR at line 1:
ORA-00096: invalid value 12.1.0.2.1 for parameter optimizer_features_enable,
must be from among 19.1.0.1, 19.1.0, 18.1.0, 12.2.0.1, 12.1.0.2, 12.1.0.1,
11.2.0.4, 11.2.0.3, 11.2.0.2, 11.2.0.1, 11.1.0.7, 11.1.0.6, 10.2.0.5, 10.2.0.4,
10.2.0.3, 10.2.0.2, 10.2.0.1, 10.1.0.5, 10.1.0.4, 10.1.0.3, 10.1.0, 9.2.0.8,
9.2.0, 9.0.1, 9.0.0, 8.1.7, 8.1.6, 8.1.5, 8.1.4, 8.1.3, 8.1.0, 8.0.7, 8.0.6,
8.0.5, 8.0.4, 8.0.3, 8.0.0

I went back to the database that I was upgrading, and tried to start it up and change the setting manually.

SQL> startup mount;
ORACLE instance started.

Total System Global Area 2.5770E+10 bytes
Fixed Size 6870952 bytes
Variable Size 3690989656 bytes
Database Buffers 2.1877E+10 bytes
Redo Buffers 194453504 bytes
Database mounted.
SQL> show parameter optimizer_features_enable

NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
optimizer_features_enable string 12.1.0.2.1
SQL> alter system set optimizer_features_enable='12.1.0.2' sid='*' scope=spfile;

System altered.


SQL> select name, value from v$spparameter where name='optimizer_features_enable';

NAME VALUE
------------------------- ---------------
optimizer_features_enable 12.1.0.2

SQL> shutdown immediate;

Once the database was shut down again, I attempted to restart the job in the autoupgrade console. Sure enough, it failed again with the same error:

upg> resume -job 101
Resuming job: [101][acup1211]
upg>
-------------------------------------------------
Errors in database [acup1211]
Stage [DBUPGRADE]
Operation [STOPPED]
Status [ERROR]
Info [
Error: UPG-1401
Opening Database acup121 in upgrade mode failed
Cause: Opening database for upgrade in the target home failed
For further details, see the log file located at /u01/app/oracle/autoupgrade/acup1211/acup1211/101/autoupgrade_20201112_user.log]

-------------------------------------------------
Logs: [/u01/app/oracle/autoupgrade/acup1211/acup1211/101/autoupgrade_20201112_user.log]
-------------------------------------------------

I double checked the log, and saw that it was still failing on the same issue. If I start the instance up in mount mode and check the parameter value, I can see that optimizer_features_enable is set to 12.1.0.2. If that’s the case, why is the upgrade saying that the instance is still trying to start with a value of 12.1.0.2.1?

It turns out that autoupgrade uses a temporary pfile to start the instance in upgrade mode. The instance is trying to start using a pfile from the 19.6.0.0 database home, and this value for optimizer_features_enable doesn’t pass validation, so the instance doesn’t even get to the point of adding anything to the alert log.

There are actually 3 pfiles located in ///temp:

[oracle@enkx4db01 ~]$ cd /u01/app/oracle/autoupgrade/acup1211/acup1211/temp
[oracle@enkx4db01 temp]$ ls -al *.ora
total 24
drwx------ 2 oracle oinstall 4096 Nov 12 15:02 .
drwx------ 5 oracle oinstall 4096 Nov 12 15:02 ..
-rwx------ 1 oracle oinstall 2265 Nov 12 15:02 after_upgrade_pfile_acup1211.ora
-rwx------ 1 oracle oinstall 1942 Nov 12 14:29 before_upgrade_pfile_acup1211.ora
-rwx------ 1 oracle oinstall 1946 Nov 12 15:05 during_upgrade_pfile_acup1211.ora

If I modify the optimizer_features_enable setting to 12.1.0.2 in during_upgrade_pfile_acup1211.ora, I can restart the upgrade. I also needed to modify the after_upgrade_pfile_acup1211.ora file – leaving the invalid setting there will cause a failure in the postupgrade phase when it creates the final spfile.

While not ideal, using autoupgrade still makes the process fairly painless, giving DBAs the ability to resume the upgrade where the job failed. I ran in to this issue on a separate cluster where the bad parameter value was only set on one of five databases. The four databases without a problem proceeded to upgrade without being affected by the database having problems.

Server-Side SSH Timeout Settings with host_access_control

acolvin — Sun, 31 May 2020 21:07:20 +0000

As part of the process to get an Exadata rack ready for running 19c databases, many clusters have to go through an upgrade from Oracle Linux 6 to Oracle Linux 7. As part of that upgrade, Oracle took the opportunity to make several configuration changes for security purposes. One of those changes relates to the SSH shell and idle timeout values.

Oracle cites the STIG (Security Technical Implementation Guides) as the reasoning for making the changes, which drop the client idle timeout down from 24 hours to 10 minutes. The implication of this change is that SSH sessions will drop after 10 minutes of idle time, even if they are actively running applications. I’ve had issues with this after an upgrade, particularly when running datapatch or closing the compute node upgrade with dbnodeupdate.sh.

Zed Anwar has a really good post on one way to circumvent this from the client side, but this can sometimes be difficult to manage with a client-based solution. For environments that don’t require the 10 minute timeout, I will frequently move the idle timeout up to an hour (3,600 seconds).

The problem with making this type of change is that updating configuration files individually is not a good way to go about making changes. Rather than modifying configuration files, Oracle offers the /opt/oracle.cellos/host_access_control script, which has many uses for changing the security settings on an Exadata host. In this case, the timeout settings are configured using the “idle-timeout” command. Current values can be seen by running host_access_control with the –status or -s flag:

[root@dm01db01 ~]# /opt/oracle.cellos/host_access_control idle-timeout -s
[INFO] [IMG-SEC-0402] Shell timeout is set to TMOUT=14400
[INFO] [IMG-SEC-0403] SSH client idle timeout is set to ClientAliveInterval 3600

If you want to make changes to the settings, simply add the values for the client idle timeout (-c, –client) or shell idle timeout (-l, –shell). For example, if I wanted to go back to the default settings from Oracle Linux 6, I would enter the following:

[root@dm01db01 ~]# /opt/oracle.cellos/host_access_control idle-timeout -l 14400 -c 86400
[INFO] [IMG-SEC-0403] SSH client idle timeout is set to 86400
[INFO] [IMG-SEC-0A02] SSHD Service restarted. Changes in effect for new connections.
[INFO] [IMG-SEC-0404] Shell timeout set to 14400

There you go – changes have been staged and the SSHD service was restarted. Another nice thing about using host_access_control is that it allows you to make changes across the entire cluster via dcli, removing the need to log in to each host:

[root@dm01db01 ~]# dcli -l root -g ~/dbs_group /opt/oracle.cellos/host_access_control idle-timeout -l 14400 -c 86400
dm01db01: [INFO] [IMG-SEC-0403] SSH client idle timeout is set to 86400
dm01db01: [INFO] [IMG-SEC-0A02] SSHD Service restarted. Changes in effect for new connections.
dm01db01: [INFO] [IMG-SEC-0404] Shell timeout set to 14400
dm01db02: [INFO] [IMG-SEC-0403] SSH client idle timeout is set to 86400
dm01db02: [INFO] [IMG-SEC-0A02] SSHD Service restarted. Changes in effect for new connections.
dm01db02: [INFO] [IMG-SEC-0404] Shell timeout set to 14400

19c Grid Infrastructure Upgrade Failures with OEDACLI

acolvin — Sun, 31 May 2020 20:32:02 +0000

As part of an ongoing project, I’ve been performing a fair amount of upgrades to 19c on Exadata systems. Several of those systems are virtualized, running Oracle VM (based on Xen). I’ve previously mentioned Oracle’s oedacli tool that can be used to make upgrades easier, and it’s been very useful once you are familiar with it.

Grid Infrastructure upgrades with oedacli are broken in to several tasks, which can be executed separately – this gives you flexibility to perform the tasks that actually bounce the cluster at a specified time. The three steps of an upgrade with oedacli are:

ADD_HOME – validates the system for use with 19c, unpacks gold image files, reconfigures guest config files, and mounts new home on guests.
CONFIG_HOME – runs gridSetup.sh to configure the new home
RUN_ROOTSCRIPT – executes rootupgrade.sh on each node and runs config tools script

We began running oedacli to perform the upgrade on our clusters using the April 2020 OEDA release. Step 1 completed without any issues, and step 2 failed almost immediately with the following error:

oedacli> deploy actions
Deploying Action ID : 2 UPGRADE CLUSTER GIVERSION=19.6.0.0.200114 GIHOMELOC=/u01/app/19.0.0.0/grid WHERE CLUSTERNAME=exa1v1 STEPNAME=CONFIG_HOME
Deploying UPGRADE CLUSTER
Upgrading Cluster
Configuring new clusterware home at /u01/app/19.0.0.0/grid
Running Cluster Verification Utility for upgrade readiness..
Relinking binaries with RDS /u01/app/19.0.0.0/grid
ERROR:
Command: ORACLE_HOME=/u01/app/19.0.0.0/grid; export ORACLE_HOME;cd /u01/app/19.0.0.0/grid/rdbms/lib;make -f ins_rdbms.mk rac_on;make -f ins_rdbms.mk ikfod;make -f ins_rdbms.mk ipc_rds ioracle ORACLE_HOME=/u01/app/19.0.0.0/grid; produced null output exa1db01v1.example.com with exit status 2
bash: line 0: cd: /u01/app/19.0.0.0/grid/rdbms/lib: Permission denied
make: ins_rdbms.mk: No such file or directory
make: *** No rule to make target `ins_rdbms.mk'. Stop.
make: ins_rdbms.mk: No such file or directory
make: *** No rule to make target `ins_rdbms.mk'. Stop.
make: ins_rdbms.mk: No such file or directory
make: *** No rule to make target `ins_rdbms.mk'. Stop.

Well, that’s not great. The good news is that it is pretty obvious to be a permissions error. I logged in to the VM, and attempted to get in to test running the relink command, and got the same error:

[oracle@exa1db01v1 ~]$ cd /u01/app/19.0.0.0/grid
-bash: cd: /u01/app/19.0.0.0/grid: Permission denied

We can definitely see a permissions issue here. If I go back as root and check the directory, I can see that the files are there with proper ownership and access:

[root@enkx4db01 ~]# ls -al /u01/app/19.0.0.0/grid
total 312
drwxr-xr-x 65 oracle oinstall 4096 May 31 15:10 .
drwxr-xr-x 10 oracle oinstall 4096 May 31 15:09 ..
drwxr-xr-x 2 oracle oinstall 4096 Apr 18 2019 addnode
drwxr-xr-x 10 oracle oinstall 4096 Apr 17 2019 assistants
drwxr-xr-x 2 oracle oinstall 12288 Apr 18 2019 bin
drwxr-xr-x 3 oracle oinstall 4096 Apr 17 2019 cha
drwxr-xr-x 4 oracle oinstall 4096 Apr 18 2019 clone
drwxr-xr-x 10 oracle oinstall 4096 Apr 18 2019 crs
drwxr-xr-x 3 oracle oinstall 4096 Apr 17 2019 css
drwxr-xr-x 7 oracle oinstall 4096 Apr 17 2019 cv
drwxr-xr-x 3 oracle oinstall 4096 Apr 17 2019 dbjava
drwxr-xr-x 2 oracle oinstall 4096 Apr 17 2019 dbs
drwxr-xr-x 5 oracle oinstall 4096 Apr 18 2019 deinstall
drwxr-xr-x 3 oracle oinstall 4096 Apr 17 2019 demo
drwxr-xr-x 3 oracle oinstall 4096 Apr 17 2019 diagnostics
drwxr-xr-x 13 oracle oinstall 4096 Apr 17 2019 dmu
-rw-r--r-- 1 oracle oinstall 852 Aug 18 2015 env.ora
drwxr-xr-x 6 oracle oinstall 4096 Apr 17 2019 evm
drwxr-xr-x 5 oracle oinstall 4096 Apr 17 2019 gpnp
-rwxr-x--- 1 oracle oinstall 3294 Mar 8 2017 gridSetup.sh
drwxr-xr-x 4 oracle oinstall 4096 Apr 17 2019 has
drwxr-xr-x 3 oracle oinstall 4096 Apr 17 2019 hs
drwxr-xr-x 10 oracle oinstall 4096 Apr 18 2019 install
drwxr-xr-x 2 oracle oinstall 4096 Apr 17 2019 instantclient
drwxr-x--- 13 oracle oinstall 4096 Apr 18 2019 inventory
drwxr-xr-x 8 oracle oinstall 4096 Apr 18 2019 javavm
drwxr-xr-x 3 oracle oinstall 4096 Apr 17 2019 jdbc
drwxr-xr-x 6 oracle oinstall 4096 Apr 18 2019 jdk
drwxr-xr-x 2 oracle oinstall 4096 Apr 17 2019 jlib
drwxr-xr-x 10 oracle oinstall 4096 Apr 17 2019 ldap
drwxr-xr-x 4 oracle oinstall 16384 Apr 18 2019 lib
drwxr-xr-x 5 oracle oinstall 4096 Apr 17 2019 md
drwxr-xr-x 10 oracle oinstall 4096 Apr 17 2019 network
drwxr-xr-x 5 oracle oinstall 4096 Apr 17 2019 nls
drwxr-x--- 14 oracle oinstall 4096 Apr 12 2019 OPatch
drwxr-xr-x 3 oracle oinstall 4096 Apr 18 2019 .opatchauto_storage
drwxr-xr-x 8 oracle oinstall 4096 Apr 17 2019 opmn
drwxr-xr-x 4 oracle oinstall 4096 Apr 17 2019 oracore
drwxr-xr-x 6 oracle oinstall 4096 Apr 17 2019 ord
drwxr-xr-x 4 oracle oinstall 4096 Apr 17 2019 ords
drwxr-xr-x 3 oracle oinstall 4096 Apr 17 2019 oss
drwxr-xr-x 8 oracle oinstall 4096 Apr 18 2019 oui
drwxr-xr-x 4 oracle oinstall 4096 Apr 17 2019 owm
drwxr-xr-x 7 oracle oinstall 4096 Apr 18 2019 .patch_storage
drwxr-xr-x 5 oracle oinstall 4096 Apr 17 2019 perl
drwxr-xr-x 6 oracle oinstall 4096 Apr 17 2019 plsql
drwxr-xr-x 5 oracle oinstall 4096 Apr 17 2019 precomp
drwxr-xr-x 2 oracle oinstall 4096 Apr 17 2019 QOpatch
drwxr-xr-x 5 oracle oinstall 4096 Apr 17 2019 qos
drwxr-xr-x 5 oracle oinstall 4096 Apr 17 2019 racg
drwxr-xr-x 13 oracle oinstall 4096 Apr 18 2019 rdbms
drwxr-xr-x 3 oracle oinstall 4096 Apr 17 2019 relnotes
drwxr-xr-x 7 oracle oinstall 4096 Apr 17 2019 rhp
-rwx------ 1 oracle oinstall 405 Apr 18 2019 root.sh
-rwx------ 1 oracle oinstall 490 Apr 17 2019 root.sh.old
-rw-r----- 1 oracle oinstall 10 Apr 17 2019 root.sh.old.1
-rwx------ 1 oracle oinstall 414 Apr 18 2019 rootupgrade.sh
-rwxr-x--- 1 oracle oinstall 628 Sep 3 2015 runcluvfy.sh
drwxr-xr-x 5 oracle oinstall 4096 Apr 17 2019 sdk
drwxr-xr-x 3 oracle oinstall 4096 Apr 17 2019 slax
drwxr-xr-x 4 oracle oinstall 4096 Apr 18 2019 sqlpatch
drwxr-xr-x 6 oracle oinstall 4096 Apr 18 2019 sqlplus
drwxr-xr-x 6 oracle oinstall 4096 Apr 17 2019 srvm
drwxr-xr-x 5 oracle oinstall 4096 Apr 17 2019 suptools
drwxr-xr-x 4 oracle oinstall 4096 Apr 17 2019 tomcat
drwxr-xr-x 3 oracle oinstall 4096 Apr 17 2019 ucp
drwxr-xr-x 7 oracle oinstall 4096 Apr 17 2019 usm
drwxr-xr-x 2 oracle oinstall 4096 Apr 17 2019 utl
-rw-r----- 1 oracle oinstall 500 Feb 6 2013 welcome.html
drwxr-xr-x 3 oracle oinstall 4096 Apr 17 2019 wlm
drwxr-xr-x 3 oracle oinstall 4096 Apr 17 2019 wwg
drwxr-xr-x 5 oracle oinstall 4096 Apr 17 2019 xag
drwxr-x--- 6 oracle oinstall 4096 Apr 17 2019 xdk

The file permissions are as I would expect (oracle:oinstall) because rootupgrade.sh hasn’t been run to change any permissions yet. I checked the log file, and could see that oedacli had logged in to the VMs as root and run a “/bin/chown -R oracle:oinstall /u01/app/19.0.0.0/grid”

[ RunCommand:218] Node exa1db01v1.example.com appears to be okay, going to run command /bin/chown -R oracle:oinstall /u01/app/19.0.0.0/grid
[ RunCommand:531] ##EXEC## |/bin/chown -R oracle:oinstall /u01/app/19.0.0.0/grid|exa1db01v1.example.com|root|
[ RunCommand:329] ##RUNC## |/bin/chown -R oracle:oinstall /u01/app/19.0.0.0/grid|exa1db01v1.example.com|root| New or not cached
[ RunCommand:183] Ran commands, elapsed time = 2002 mS
[ KommandOutput:104] ====== Output from node exa1db01v1.example.com ======
[ KommandOutput:106] Command = exa1db01v1.example.com | root | /bin/chown -R oracle:oinstall /u01/app/19.0.0.0/grid
[ KommandOutput:108] Ret code = <0> from node exa1db01v1.example.com
[ KommandOutput:113] ## Output Start
[ EsCommonUtils:1368] Command: /bin/chown -R oracle:oinstall /u01/app/19.0.0.0/grid produced null output but executed successfully on exa1db01v1.example.com
[ KommandOutput:116] ====== End Output from node exa1db01v1.example.com Ret code0 ======

If that’s the case, why can’t the oracle user run the relink? The issue lies one directory up from the actual GI home. The ownership for /u01/app/19.0.0.0 is set to root:root, and the permissions were 750:

[root@exa1db01v1 ~]# ls -al /u01/app/
total 72
drwxr-xr-x 11 root oinstall 4096 May 22 19:22 .
drwxr-xr-x 7 root oinstall 4096 Nov 16 2018 ..
drwxr-xr-x 3 root oinstall 4096 Aug 9 2018 12.2.0.1
drwxr-x--- 3 root root 4096 May 9 19:22 19.0.0.0 <---------permissions don't allow Oracle to access directory
drwxrwxr-x 17 oracle oinstall 4096 Jan 13 10:55 oracle
drwxrwx--- 6 oracle oinstall 4096 May 21 23:53 oraInventory

[root@exa1db01v1 ~]# ls -al /u01/app/19.0.0.0/
total 12
drwxr-x--- 3 root root 4096 May 9 19:22 .
drwxr-xr-x 11 root oinstall 4096 May 9 19:22 ..
drwxr-xr-x 71 root root 4096 May 9 19:21 grid <---------permissions are ok

On this system, the umask for the root user is set to 027, rather than the default of 022. These types of changes are fairly common on systems that require hardening, particularly systems that use the Exadata STIG scripts (MOS note #2181944.1) for hardening.

When the GI home directory was created, the command used was “mkdir -p /u01/app/19.0.0.0/grid,” which created the parent directory as well. The permissions followed the umask of the root user, which prevent non-root users from having access.

After identifying the issue, running a simple “chmod 755 /u01/app/19.0.0.0” fixed the issue, and we were able to restart the GI upgrade with the CONFIG_HOME step.

Slow Exadata Compute Node Upgrades to Oracle Linux 7

acolvin — Tue, 19 May 2020 14:22:50 +0000

One of the cool things that Oracle has done with Exadata is give users the ability to upgrade from Oracle Linux 6 to Oracle Linux 7 with an in-place upgrade process. This comes automatically when you upgrade the Exadata compute image from 18c to 19c. Upgrading the compute nodes to 19c give you the ability to upgrade your grid infrastructure to 19c, followed by installing a 19c database home and beginning the database upgrade process.

The compute node upgrade is done using the same tools as a normal Exadata operating system patch, either via dbnodeupdate.sh or its wrapper script, patchmgr. As part of the process, the server will go through a couple of reboots and perform some actions, during which time the SSH daemon is shut down. Normally, this blackout only lasts for 30 minutes. This always reminds me of the Apollo 13 reentry, where the normal communications blackout lasted nearly twice what it normally would.

I’ve seen a handful of upgrades where the system seemed to be completely stuck. Connecting to the serial console gave me the following output – hitting enter didn’t move the cursor, and I couldn’t see any activity for more than an hour:

I remembered after looking at the serial console that the upgrade process performs a kexec reboot with different options than a normal boot, so more text actually shows up in the Java-based console from the ILOM’s web interface. Looking there wasn’t much more help:

The system sat here for a long time as well, seemingly stuck. I had two options – force a reboot through the ILOM, or wait out the upgrade process to see if it would go through. I’d gone through the reboot option before, but that left me with a host requiring a manual restore from the backup taken by dbnodeupdate.sh, so I decided to wait it out and see what happens.

I remembered that with the Java console, I have the option to hit alt-F2 to go to the second console screen. Thankfully, that worked, and I was presented a bash prompt. I tried to run a few commands, but not much was available in the path. I discovered that the root volume of the host was mounted as /sysroot, and could find the patch logs in /sysroot/var/log/cellos/exadata.computenode.post.log. From there, I could run “/sysroot/bin/tail -f” on the log file, and see what was happening:

There we have it – as part of the upgrade, the /opt/oracle.cellos/image_functions script removes global read, write, and execute permissions on files in a handful of directories, including /opt and /usr. This process changes permissions one file at a time. Even though it’s a quick process, directories like exachk can have tens of thousands of files inside. Just like any other row-by-row processing, even a quick operation done too many times will slow you down. This tracks back to Oracle bug #30365408, which skips the /opt/oracle.SupportTools/exachk directory. Unfortunately, the system in question here had a backup of the exachk directory found in /opt/oracle.SupportTools/exachk.bk. This directory wasn’t skipped, and it added about an hour to go through the 95,000 files in that directory. Thankfully, the script times out after 150 minutes, but that can still blow out a maintenance window if you’re not careful.

Older versions of exachk seem to be the biggest offenders, and they typically only exist on the first node of a cluster. If it’s possible to clean up anything sitting in /opt, definitely add that to your pre-patch checks before going through the upgrade to Oracle Linux 7.

All in all, as with most patching activities, it’s best to wait out the process for a full failure before jumping in and trying to nudge the system along the way.

OEDA Virtualized Cluster Discovery With SSH Keys

acolvin — Thu, 07 May 2020 21:04:22 +0000

As part of the Oracle Exadata Deployment Assistant (“OEDA”), Oracle includes a command line utility to read and modify the XML files used for deployment of an Exadata cluster. Typical use cases are to install additional Oracle database versions, or to create multiple databases before deployment. There are several additional features included for virtualized clusters, particularly the ability to simplify upgrading Grid Infrastructure.

In many cases, the original XML used for the deployment is still available, and that’s all you need to complete the upgrade. For some older clusters, the original XML may not utilize the same internal format as the most current OEDA tools, or there may have been other changes performed on the cluster over the years – I have some systems where new nodes have been added, additional clusters built, and there isn’t a single consistent file for the entire system. The good news is that you can use the oedacli utility to discover the virtualized clusters running on an Exadata, and it will generate an updated XML file for you.

To do this, just log in to dom0 on one of the Exadata hosts, and download the latest OEDA package (my version is 19.3.6, which was released in April 2020). Once the software is unpacked, perform the following steps:

mkdir /root/discovered
./oedacli
discover es hostname='db01,db02,cel01,cel02,cel03' location=/root/discovered

From there, OEDA will discover any running virtual machines, connect to them, and run a full discovery. This should work without a problem if you still have all of the passwords set to the default values. In most cases, the passwords have been changed from the defaults at some point. The OEDA script utilizes the expect command to enter passwords for SSH commands, so you can easily modify the passwords it will use via the genPasswordHash.sh script. This functionality is a bit limited, though, in that it expects the same password for all clusters on the system. What if each cluster has a different password? This seems like a good use for SSH keys. Fortunately, there is a way to utilize SSH keys with OEDA cluster discovery.

While oedacli offers the ability to generate new SSH keys for authentication with each cluster, I preferred to use existing keys that were already configured from the root account on dom0 of the first compute node. The process to perform discovery using SSH keys is pretty easy:

Create an SSH key pair if one doesn’t already exist
Add the SSH public key to the authorized_keys file for both oracle/grid and root accounts
Perform discovery
Remove SSH key access

If your root account doesn’t already have an SSH key pair created, you can create one with:

ssh-keygen -t rsa

Hit enter at the prompts and it will create a key that doesn’t require a passphrase. This is important from an oedacli perspective, as a keys that use a passphrase will not work for the silent installation/discovery process.

From there, make sure that you have three sets of dcli group files – one with the names of all virtualized guests (~/vm_group), one with the names of the physical compute nodes (~/dbs_group), and one with the names of the storage servers (~/cell_group). The contents of my files are:

[root@enkx4db03 ~]# cat vm_group
enkx4db03c01
enkx4db03c02
enkx4db03c04
enkx4db03c05
enkx4db04c01
enkx4db04c02
enkx4db04c04
enkx4db04c05

[root@enkx4db03 ~]# cat dbs_group
enkx4db03
enkx4db04

[root@enkx4db03 ~]# cat cell_group
enkx4cel05
enkx4cel06
enkx4cel07

You can use dcli to add the SSH key to a user account by adding the -k flag. If the key is not already in the authorized_keys file, you will be prompted for the password. Note that the password isn’t saved, so it must be entered for each host. Use dcli to add the keys for each software owner (oracle, grid), and root on the guests, and once to configure root access for the storage servers:

[root@enkx4db03 ~]# dcli -l root -g ~/vm_group -k
root@enkx4db03c01's password:
root@enkx4db03c02's password:
root@enkx4db03c04's password:
root@enkx4db03c05's password:
root@enkx4db04c01's password:
root@enkx4db04c02's password:
root@enkx4db04c04's password:
root@enkx4db04c05's password:
enkx4db03c01: ssh key added
enkx4db03c02: ssh key added
enkx4db03c04: ssh key added
enkx4db03c05: ssh key added
enkx4db04c01: ssh key added
enkx4db04c02: ssh key added
enkx4db04c04: ssh key added
enkx4db04c05: ssh key added

[root@enkx4db03 ~]# dcli -l root -g cell_group -k
root@enkx4cel05's password:
root@enkx4cel06's password:
root@enkx4cel07's password:
enkx4cel05: ssh key added
enkx4cel06: ssh key added
enkx4cel07: ssh key added

[root@enkx4db03 ~]# dcli -l root -g ~/dbs_group -k
root@enkx4db03's password:
root@enkx4db04's password:
enkx4db03: ssh key added
enkx4db04: ssh key added

Now for the tricky part – oedacli will not just use the default key in /root/.ssh. It expects a separate key pair in the WorkDir directory for each user and host in the discovery. The expected naming format is id_rsa..[.pub]. An example for enkx4db03c01 would be to have files named id_rsa.enkx4db03c01.oracle and id_rsa.enkx4db03c01.oracle.pub. Since we have the dcli group files, we can easily create those files without much fuss. In the example below, my OEDA is unzipped to /EXAVMIMAGES/onecommand/2020_apr/linux-x64. Modify the OEDA_WORKDIR variable to match where your WorkDir is:

[root@enkx4db03 ~]# for hosts in `cat ~/vm_group`; \
do \
export OEDA_WORKDIR=/EXAVMIMAGES/onecommand/2020_apr/linux-x64/WorkDir; \
cp ~/.ssh/id_rsa $OEDA_WORKDIR/id_rsa.$hosts.oracle; \
cp ~/.ssh/id_rsa $OEDA_WORKDIR/id_rsa.$hosts.grid; \
cp ~/.ssh/id_rsa $OEDA_WORKDIR/id_rsa.$hosts.root; \
cp ~/.ssh/id_rsa.pub $OEDA_WORKDIR/id_rsa.$hosts.oracle.pub; \
cp ~/.ssh/id_rsa.pub $OEDA_WORKDIR/id_rsa.$hosts.grid.pub; \
cp ~/.ssh/id_rsa.pub $OEDA_WORKDIR/id_rsa.$hosts.root.pub; \
done

[root@enkx4db03 ~]# for hosts in `cat ~/cell_group`; \
do \
export OEDA_WORKDIR=/EXAVMIMAGES/onecommand/2020_apr/linux-x64/WorkDir; \
cp ~/.ssh/id_rsa $OEDA_WORKDIR/id_rsa.$hosts.root; \
cp ~/.ssh/id_rsa.pub $OEDA_WORKDIR/id_rsa.$hosts.root.pub; \
done

[root@enkx4db03 ~]# for hosts in `cat ~/dbs_group`; \
do \
export OEDA_WORKDIR=/EXAVMIMAGES/onecommand/2020_apr/linux-x64/WorkDir; \
cp ~/.ssh/id_rsa $OEDA_WORKDIR/id_rsa.$hosts.root; \
cp ~/.ssh/id_rsa.pub $OEDA_WORKDIR/id_rsa.$hosts.root.pub; \
done

You should now have a set of SSH key pairs for each account on your Exadata rack. We can now run the discovery to create new OEDA XML files. Launch oedacli, enable SSH key authentication, and run discovery:

[root@enkx4db03 ~]# cd /EXAVMIMAGES/onecommand/2020_apr/linux-x64
[root@enkx4db03 linux-x64]# ./oedacli
oedacli> set sshkeys enable=true
oedacli> discover es hostnames='enkx4db03,enkx4db04,enkx4cel05,enkx4cel06,enkx4cel07' location=/EXAVMIMAGES/onecommand/2020_apr/linux-x64/discovery

OEDA will now connect to each of the hosts and discover the existing software installations, patch versions, and ASM diskgroup configurations. The location specified will have an XML file for each individual cluster, as well as a full XML file containing each of the clusters. You will also be able to see an installation template in HTML format, and a new checkip script.

Finally, you can remove the SSH keys using the dcli command with the –unkey option:

[root@enkx4db03 ~]# dcli -l root -g ~/vm_group --unkey
enkx4db03c01: ssh key dropped
enkx4db03c02: ssh key dropped
enkx4db03c04: ssh key dropped
enkx4db03c05: ssh key dropped
enkx4db04c01: ssh key dropped
enkx4db04c02: ssh key dropped
enkx4db04c04: ssh key dropped
enkx4db04c05: ssh key dropped

[root@enkx4db03 ~]# dcli -l root -g ~/dbs_group --unkey
enkx4db03: ssh key dropped
enkx4db04: ssh key dropped

[root@enkx4db03 ~]# dcli -l root -g cell_group --unkey
enkx4cel05: ssh key dropped
enkx4cel06: ssh key dropped
enkx4cel07: ssh key dropped

Here is the full output of my OEDA discovery – you may see that it reports that there are no database homes on certain clusters, but it still completed discovery of all running objects in the rack.

[root@enkx4db03 linux-x64]# ./oedacli
oedacli> set sshkeys enable=true
oedacli> discover es hostnames='enkx4db03,enkx4db04,enkx4cel05,enkx4cel06,enkx4cel07' location=/EXAVMIMAGES/onecommand/2020_apr/linux-x64/discovery
Discovering nodes [enkx4db03, enkx4db04, enkx4cel05, enkx4cel06, enkx4cel07]...
...Running Software Discovery on enkx4db03c01.enkitec.local
Discovering software on: enkx4db03c01.enkitec.local
Discovering cluster details on node: enkx4db03c01.enkitec.local on cluster c0_clusterHome
Discovering Database details on node: enkx4db03c01.enkitec.local for clusterId c0_clusterHome
No Database found for database home /u01/app/oracle/product/19.0.0.0/dbhome_1 on enkx4db03c01.enkitec.local
No Database found for database home /u01/app/oracle/product/12.2.0.1/dbhome_1 on enkx4db03c01.enkitec.local
ERROR: No databaseHomes discovered on enkx4db03c01.enkitec.local
Done Software Discovery on enkx4db03c01.enkitec.local
...Running Software Discovery on enkx4db03c02.enkitec.local
Discovering software on: enkx4db03c02.enkitec.local
Discovering cluster details on node: enkx4db03c02.enkitec.local on cluster c1_clusterHome
Discovering Database details on node: enkx4db03c02.enkitec.local for clusterId c1_clusterHome
No Database found for database home /u01/app/oracle/product/19.0.0.0/dbhome_3 on enkx4db03c02.enkitec.local
No Database found for database home /u01/app/oracle/product/11.2.0.4/dbhome_2 on enkx4db03c02.enkitec.local
No Database found for database home /u01/app/oracle/product/19.0.0.0/dbhome_3 on enkx4db03c02.enkitec.local
Done Software Discovery on enkx4db03c02.enkitec.local
...Running Software Discovery on enkx4db03c05.enkitec.local
Discovering software on: enkx4db03c05.enkitec.local
Discovering cluster details on node: enkx4db03c05.enkitec.local on cluster c2_clusterHome
Discovering Database details on node: enkx4db03c05.enkitec.local for clusterId c2_clusterHome
No Database found for database home /u01/app/oracle/product/11.2.0.4/dbhome_1 on enkx4db03c05.enkitec.local
No Database found for database home /u01/app/oracle/product/11.2.0.4/dbhome_1 on enkx4db03c05.enkitec.local
No Database found for database home /u01/app/oracle/product/12.2.0.1/dbhome_1 on enkx4db03c05.enkitec.local
No Database found for database home /u01/app/oracle/product/12.2.0.1/dbhome_1 on enkx4db03c05.enkitec.local
Done Software Discovery on enkx4db03c05.enkitec.local
...Running Software Discovery on enkx4db03c04.enkitec.local
Discovering software on: enkx4db03c04.enkitec.local
Discovering cluster details on node: enkx4db03c04.enkitec.local on cluster c3_clusterHome
Discovering Database details on node: enkx4db03c04.enkitec.local for clusterId c3_clusterHome
No Database found for database home /u01/app/oracle/product/19.0.0.0/dbhome_1 on enkx4db03c04.enkitec.local
Done Software Discovery on enkx4db03c04.enkitec.local
...Running Software Discovery on enkx4db04c04.enkitec.local
Discovering software on: enkx4db04c04.enkitec.local
Done Software Discovery on enkx4db04c04.enkitec.local
...Running Software Discovery on enkx4db04c01.enkitec.local
Discovering software on: enkx4db04c01.enkitec.local
Done Software Discovery on enkx4db04c01.enkitec.local
...Running Software Discovery on enkx4db04c02.enkitec.local
Discovering software on: enkx4db04c02.enkitec.local
Done Software Discovery on enkx4db04c02.enkitec.local
...Running Software Discovery on enkx4db04c05.enkitec.local
Discovering software on: enkx4db04c05.enkitec.local
Done Software Discovery on enkx4db04c05.enkitec.local
Discovering local disks ....
Discovering switches...
Discovering racks...
Writing Engineered System preconf : /EXAVMIMAGES/onecommand/2020_apr/linux-x64/discovery/Discovered-preconf_rack_0.csv
Creating databasemachine.xml for EM discovery
Done Creating databasemachine.xml for EM discovery
Creating databasemachine.xml for EM discovery
Done Creating databasemachine.xml for EM discovery
Creating databasemachine.xml for EM discovery
Done Creating databasemachine.xml for EM discovery
Creating databasemachine.xml for EM discovery
Done Creating databasemachine.xml for EM discovery
Writing platinum file : /EXAVMIMAGES/onecommand/2020_apr/linux-x64/discovery/Discovered-platinum.csv

Creating Installation template /EXAVMIMAGES/onecommand/2020_apr/linux-x64/discovery/Discovered-InstallationTemplate.html...
Created Installation template /EXAVMIMAGES/onecommand/2020_apr/linux-x64/discovery/Discovered-InstallationTemplate.html
Writing checkip validation script : /EXAVMIMAGES/onecommand/2020_apr/linux-x64/discovery/Discovered-checkip.sh

Validating Engineered System....
Rack Descripton: X4-2 Quarter Rack HC 4TB
..Cluster Name: enkx4c04
..Cluster Node List:[enkx4db03c04.enkitec.local, enkx4db04c04.enkitec.local]
..Storage Server List:[enkx4cel05, enkx4cel06, enkx4cel07]
..Cluster Software Details..
....Cluster Home:/u01/app/19.0.0.0/grid
....Cluster Version:19.5.0.0.191015
....Cluster Scan Name:enkx4c04-scan
..Cluster Owner/Group details..
....Owner:oracle
....Groups:[oinstall, dba]
..Storage Details..
....Disk Group:DATAC4, Size:504G, DiskGroup Type:DATA
Database Home Details..
Warning: No database homes found..
.......
..Cluster Name: enkx4vm1
..Cluster Node List:[enkx4db03c01.enkitec.local, enkx4db04c01.enkitec.local]
..Storage Server List:[enkx4cel05, enkx4cel06, enkx4cel07]
..Cluster Software Details..
....Cluster Home:/u01/app/19.0.0.0/grid
....Cluster Version:19.6.0.0.200114
....Cluster Scan Name:enkx4c01-scan
..Cluster Owner/Group details..
....Owner:oracle
....Groups:[oinstall, dba]
..Storage Details..
....Disk Group:DATAC1, Size:3078G, DiskGroup Type:DATA
....Disk Group:RECOC1, Size:1026G, DiskGroup Type:RECO
Database Home Details..
Warning: No database homes found..
.......
..Cluster Name: enkx4vm2
..Cluster Node List:[enkx4db03c02.enkitec.local, enkx4db04c02.enkitec.local]
..Storage Server List:[enkx4cel05, enkx4cel06, enkx4cel07]
..Cluster Software Details..
....Cluster Home:/u01/app/19.0.0.0/grid
....Cluster Version:19.3.1.0.0
....Cluster Scan Name:enkx4c02-scan
..Cluster Owner/Group details..
....Owner:grid
....Groups:[oinstall, asmdba, asmoper, asmadmin]
..Storage Details..
....Disk Group:DATAC2, Size:1800G, DiskGroup Type:DATA
....Disk Group:RECOC2, Size:1008G, DiskGroup Type:RECO
Database Home Details..
...Database Home Location:/u01/app/oracle/product/19.0.0.0/dbhome_3
...Database Home Version:19.3.1.0.0
...Database Software Owner:oracle
...Groups:[oinstall, asmdba, dba, racoper]
...Databases:[cdb19: Db Node List:[enkx4db03c02, enkx4db04c02]]
...Database Home Location:/u01/app/oracle/product/12.1.0.2/dbhome_1
...Database Home Version:12.1.0.2.190416
...Database Software Owner:oracle
...Groups:[oinstall, asmdba, dba, racoper]
...Databases:[dbm03: Db Node List:[enkx4db03c02, enkx4db04c02]]
.......
..Cluster Name: enkx4vm5
..Cluster Node List:[enkx4db03c05.enkitec.local, enkx4db04c05.enkitec.local]
..Storage Server List:[enkx4cel05, enkx4cel06, enkx4cel07]
..Cluster Software Details..
....Cluster Home:/u01/app/12.2.0.1/grid
....Cluster Version:12.2.0.1.171017
....Cluster Scan Name:enkx4c05-scan
..Cluster Owner/Group details..
....Owner:oracle
....Groups:[oinstall, dba]
..Storage Details..
....Disk Group:DATAC6, Size:1026G, DiskGroup Type:DATA
....Disk Group:RECOC6, Size:504G, DiskGroup Type:RECO
Database Home Details..
Warning: No database homes found..
.......
oedacli>

Now that the discovery is complete, we can move on to upgrading the virtualized clusters using oedacli.

OEM SSL Cipher Hardening Reset After Securing OMS

acolvin — Thu, 12 Sep 2019 19:46:20 +0000

I have recently been installing Oracle Enterprise Manager at several sites, and one of the key requirements has been to ensure that the installation isn’t using insecure HTTPS protocols. Securing the OMS and agents typically consists of two components – ensuring that only secure SSL ciphers are being used, and shutting down protocols that have known vulnerabilities. Thankfully, Oracle has documented the procedures in two separate MOS notes:

Doc ID 2138391.1 – 13c: How to Disable Weak SSLCipherSuites in Enterprise Manager 13c Cloud Control
Doc ID 2212006.1 – EM 13c: Enterprise Manager 13c Cloud Control Configuration to Support Transport Layer Security Protocol:TLSv1.2 only

This isn’t a post about how to perform the task – that is outlined pretty well in the MOS documents. The interesting piece is the behavior that I saw after I thought that the task was completed. We went through the entire setup of both notes and requested a security scan, and found that the agent upload port (4903) was reporting weak ciphers. That was strange, because we had updated all of the files as described in the MOS notes.

Currently, the OMS used in these examples has gone through the procedure detailed in MOS note #2138391.1. At this point, weak SSL ciphers have been disabled, but it has not been secured to require TLSv1.2. We can validate the SSL ciphers with nmap below (nmap output abbreviated to show only the SSL cipher output):

Andys-MacBook-Pro-3:~ acolvin$ sudo nmap -sV --script ssl-enum-ciphers -p 4903 enkpoemac1
Password:

Starting Nmap 7.40 ( https://nmap.org ) at 2019-08-22 09:17 CDT
---
| ssl-enum-ciphers:
| TLSv1.0:
| ciphers:
| TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA (secp256r1) - A
| TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA (secp256r1) - A
| compressors:
| NULL
| cipher preference: client
| TLSv1.1:
| ciphers:
| TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA (secp256r1) - A
| TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA (secp256r1) - A
| compressors:
| NULL
| cipher preference: client
| TLSv1.2:
| ciphers:
| TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA (secp256r1) - A
| TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA (secp256r1) - A
| TLS_RSA_WITH_AES_128_CBC_SHA256 (rsa 1024) - A
| TLS_RSA_WITH_AES_256_CBC_SHA256 (rsa 1024) - A
| compressors:
| NULL
| cipher preference: client
|_ least strength: A
---

As you can see, all of the ciphers that are allowed match an “A” score. Just to see, I checked the timestamp of the configuration files.

[oracle@enkpoemac1 gc_inst]$ find . -name httpd_em.conf -exec ls -l {} \;
-rw-r--r-- 1 oracle oinstall 5804 Aug 22 09:02 ./user_projects/domains/GCDomain/config/fmwconfig/components/OHS/ohs1/moduleconf/httpd_em.conf
-rw-r--r-- 1 oracle oinstall 5804 Aug 22 09:04 ./user_projects/domains/GCDomain/config/fmwconfig/components/OHS/instances/ohs1/moduleconf/httpd_em.conf

At this point, I run the procedure on MOS note #2212006.1. In order to lock down the TLS protocols, we run a couple of “emctl secure XXX” commands and restart the OMS:

[oracle@enkpoemac1 gc_inst]$ emctl secure oms -console -protocol "TLSv1.2"
Oracle Enterprise Manager Cloud Control 13c Release 3
Copyright (c) 1996, 2018 Oracle Corporation. All rights reserved.
Securing OMS... Started.
Enter Enterprise Manager Root (SYSMAN) Password :
Enter Agent Registration Password :
Securing OMS... Successful
Restart OMS

[oracle@enkpoemac1 gc_inst]$ emctl secure oms -protocol "TLSv1.2"
Oracle Enterprise Manager Cloud Control 13c Release 3
Copyright (c) 1996, 2018 Oracle Corporation. All rights reserved.
Securing OMS... Started.
Enter Enterprise Manager Root (SYSMAN) Password :
Enter Agent Registration Password :
Securing OMS... Successful
Restart OMS

[oracle@enkpoemac1 gc_inst]$ emctl stop oms -all
Oracle Enterprise Manager Cloud Control 13c Release 3
Copyright (c) 1996, 2018 Oracle Corporation. All rights reserved.
Stopping Oracle Management Server...
WebTier Successfully Stopped
Oracle Management Server Successfully Stopped
Oracle Management Server is Down
JVMD Engine is Down
Stopping BI Publisher Server...
BI Publisher Server Successfully Stopped
AdminServer Successfully Stopped
BI Publisher Server is Down

[oracle@enkpoemac1 gc_inst]$ emctl start oms
Oracle Enterprise Manager Cloud Control 13c Release 3
Copyright (c) 1996, 2018 Oracle Corporation. All rights reserved.
Starting Oracle Management Server...
WebTier Successfully Started
Oracle Management Server Successfully Started
Oracle Management Server is Up
JVMD Engine is Up
Starting BI Publisher Server ...
BI Publisher Server Successfully Started
BI Publisher Server is Up

After restarting the OMS, we run another scan, and see different results:

Andys-MacBook-Pro-3:~ acolvin$ sudo nmap -sV --script ssl-enum-ciphers -p 4903 enkpoemac1
Password:

Starting Nmap 7.40 ( https://nmap.org ) at 2019-08-22 09:50 CDT
---
| ssl-enum-ciphers:
| TLSv1.2:
| ciphers:
| TLS_ECDHE_RSA_WITH_3DES_EDE_CBC_SHA (secp256r1) - D
| TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA (secp256r1) - A
| TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA (secp256r1) - A
| TLS_ECDHE_RSA_WITH_RC4_128_SHA (secp256r1) - D
| TLS_RSA_WITH_3DES_EDE_CBC_SHA (rsa 1024) - D
| TLS_RSA_WITH_AES_128_CBC_SHA (rsa 1024) - A
| TLS_RSA_WITH_AES_128_CBC_SHA256 (rsa 1024) - A
| TLS_RSA_WITH_AES_128_GCM_SHA256 (rsa 1024) - A
| TLS_RSA_WITH_AES_256_CBC_SHA (rsa 1024) - A
| TLS_RSA_WITH_AES_256_CBC_SHA256 (rsa 1024) - A
| TLS_RSA_WITH_AES_256_GCM_SHA384 (rsa 1024) - A
| TLS_RSA_WITH_RC4_128_MD5 (rsa 1024) - D
| TLS_RSA_WITH_RC4_128_SHA (rsa 1024) - D
| compressors:
| NULL
| cipher preference: client
| warnings:
| 64-bit block cipher 3DES vulnerable to SWEET32 attack
| Broken cipher RC4 is deprecated by RFC 7465
| Ciphersuite uses MD5 for message integrity
|_ least strength: D
---

At this point, we decided to check the configuration files to see which files had been recently modified:

[oracle@enkpoemac1 config]$ find . -mmin -20 -type f -exec ls -l {} +
-rw-r----- 1 oracle oinstall 0 Aug 22 09:33 ./config.lok
-rw-r----- 1 oracle oinstall 60536 Aug 22 09:29 ./config.xml
-rw-r----- 1 oracle oinstall 2415 Aug 22 09:26 ./diagnostics/Module-FMWDFW-2818.xml
-rw-r--r-- 1 oracle oinstall 36868 Aug 22 09:29 ./fmwconfig/components/OHS/instances/ohs1/httpd.conf.emctl_secure
-rw-r----- 1 oracle oinstall 2920 Aug 22 09:26 ./fmwconfig/components/OHS/instances/ohs1/keystores/console/cwallet.sso
-rw-r----- 1 oracle oinstall 2843 Aug 22 09:26 ./fmwconfig/components/OHS/instances/ohs1/keystores/console/ewallet.p12
-rw-r----- 1 oracle oinstall 2920 Aug 22 09:29 ./fmwconfig/components/OHS/instances/ohs1/keystores/upload/cwallet.sso
-rw-r----- 1 oracle oinstall 2843 Aug 22 09:29 ./fmwconfig/components/OHS/instances/ohs1/keystores/upload/ewallet.p12
-rw-r--r-- 1 oracle oinstall 609 Aug 22 09:29 ./fmwconfig/components/OHS/instances/ohs1/moduleconf/agent_download.conf
-rw-r----- 1 oracle oinstall 609 Aug 22 09:26 ./fmwconfig/components/OHS/instances/ohs1/moduleconf/agent_download.conf.2019_08_22_09_26_45
-rw-r----- 1 oracle oinstall 609 Aug 22 09:29 ./fmwconfig/components/OHS/instances/ohs1/moduleconf/agent_download.conf.2019_08_22_09_29_50
-rw-r----- 1 oracle oinstall 1351 Aug 22 09:29 ./fmwconfig/components/OHS/instances/ohs1/moduleconf/httpd_bip.conf
-rw-r----- 1 oracle oinstall 1351 Aug 22 09:26 ./fmwconfig/components/OHS/instances/ohs1/moduleconf/httpd_bip.conf.2019_08_22_09_26_45
-rw-r----- 1 oracle oinstall 1351 Aug 22 09:29 ./fmwconfig/components/OHS/instances/ohs1/moduleconf/httpd_bip.conf.2019_08_22_09_29_50
-rw-r--r-- 1 oracle oinstall 5659 Aug 22 09:29 ./fmwconfig/components/OHS/instances/ohs1/moduleconf/httpd_em.conf
-rw-r----- 1 oracle oinstall 5347 Aug 22 09:26 ./fmwconfig/components/OHS/instances/ohs1/moduleconf/httpd_em.conf.2019_08_22_09_26_45
-rw-r----- 1 oracle oinstall 5347 Aug 22 09:29 ./fmwconfig/components/OHS/instances/ohs1/moduleconf/httpd_em.conf.2019_08_22_09_29_50
-rw-r----- 1 oracle oinstall 4051 Aug 22 09:29 ./fmwconfig/components/OHS/instances/ohs1/moduleconf/ssl_bip.conf
-rw-r----- 1 oracle oinstall 3972 Aug 22 09:26 ./fmwconfig/components/OHS/instances/ohs1/moduleconf/ssl_bip.conf.2019_08_22_09_26_45
-rw-r----- 1 oracle oinstall 3972 Aug 22 09:29 ./fmwconfig/components/OHS/instances/ohs1/moduleconf/ssl_bip.conf.2019_08_22_09_29_50
-rw-r----- 1 oracle oinstall 3981 Aug 22 09:29 ./fmwconfig/components/OHS/instances/ohs1/moduleconf/ssl_bip.conf.tmp
-rw-r----- 1 oracle oinstall 2105 Aug 22 09:29 ./fmwconfig/components/OHS/instances/ohs1/mod_wl_ohs.conf
-rw-r----- 1 oracle oinstall 2105 Aug 22 09:26 ./fmwconfig/components/OHS/instances/ohs1/mod_wl_ohs.conf.2019_08_22_09_26_29
-rw-r----- 1 oracle oinstall 2008 Aug 22 09:26 ./fmwconfig/components/OHS/instances/ohs1/mod_wl_ohs.conf.2019_08_22_09_26_45
-rw-r----- 1 oracle oinstall 2105 Aug 22 09:29 ./fmwconfig/components/OHS/instances/ohs1/mod_wl_ohs.conf.2019_08_22_09_29_38
-rw-r----- 1 oracle oinstall 2008 Aug 22 09:29 ./fmwconfig/components/OHS/instances/ohs1/mod_wl_ohs.conf.2019_08_22_09_29_50
-rw-r--r-- 1 oracle oinstall 2105 Aug 22 09:29 ./fmwconfig/components/OHS/instances/ohs1/mod_wl_ohs.conf.emctl_secure
-rw-r----- 1 oracle oinstall 3679 Aug 22 09:29 ./fmwconfig/components/OHS/instances/ohs1/ssl.conf
-rw-r----- 1 oracle oinstall 3682 Aug 22 09:26 ./fmwconfig/components/OHS/instances/ohs1/ssl.conf.2019_08_22_09_26_44
-rw-r----- 1 oracle oinstall 3679 Aug 22 09:26 ./fmwconfig/components/OHS/instances/ohs1/ssl.conf.2019_08_22_09_26_45
-rw-r----- 1 oracle oinstall 3679 Aug 22 09:29 ./fmwconfig/components/OHS/instances/ohs1/ssl.conf.2019_08_22_09_29_49
-rw-r----- 1 oracle oinstall 3679 Aug 22 09:29 ./fmwconfig/components/OHS/instances/ohs1/ssl.conf.2019_08_22_09_29_50
-rw-r--r-- 1 oracle oinstall 3679 Aug 22 09:29 ./fmwconfig/components/OHS/instances/ohs1/ssl.conf.emctl_secure
-rw-r--r-- 1 oracle oinstall 36868 Aug 22 09:29 ./fmwconfig/components/OHS/ohs1/httpd.conf.emctl_secure
-rw-r----- 1 oracle oinstall 2920 Aug 22 09:26 ./fmwconfig/components/OHS/ohs1/keystores/console/cwallet.sso
-rw-r--r-- 1 oracle oinstall 2843 Aug 22 09:26 ./fmwconfig/components/OHS/ohs1/keystores/console/ewallet.p12
-rw-r----- 1 oracle oinstall 2920 Aug 22 09:29 ./fmwconfig/components/OHS/ohs1/keystores/upload/cwallet.sso
-rw-r--r-- 1 oracle oinstall 2843 Aug 22 09:29 ./fmwconfig/components/OHS/ohs1/keystores/upload/ewallet.p12
-rw-r--r-- 1 oracle oinstall 609 Aug 22 09:29 ./fmwconfig/components/OHS/ohs1/moduleconf/agent_download.conf
-rw-r----- 1 oracle oinstall 609 Aug 22 09:26 ./fmwconfig/components/OHS/ohs1/moduleconf/agent_download.conf.2019_08_22_09_26_45
-rw-r----- 1 oracle oinstall 609 Aug 22 09:29 ./fmwconfig/components/OHS/ohs1/moduleconf/agent_download.conf.2019_08_22_09_29_50
-rw-r----- 1 oracle oinstall 1351 Aug 22 09:29 ./fmwconfig/components/OHS/ohs1/moduleconf/httpd_bip.conf
-rw-r----- 1 oracle oinstall 1351 Aug 22 09:26 ./fmwconfig/components/OHS/ohs1/moduleconf/httpd_bip.conf.2019_08_22_09_26_45
-rw-r----- 1 oracle oinstall 1351 Aug 22 09:29 ./fmwconfig/components/OHS/ohs1/moduleconf/httpd_bip.conf.2019_08_22_09_29_50
-rw-r--r-- 1 oracle oinstall 5659 Aug 22 09:29 ./fmwconfig/components/OHS/ohs1/moduleconf/httpd_em.conf
-rw-r----- 1 oracle oinstall 5347 Aug 22 09:26 ./fmwconfig/components/OHS/ohs1/moduleconf/httpd_em.conf.2019_08_22_09_26_45
-rw-r----- 1 oracle oinstall 5347 Aug 22 09:29 ./fmwconfig/components/OHS/ohs1/moduleconf/httpd_em.conf.2019_08_22_09_29_50
-rw-r----- 1 oracle oinstall 4051 Aug 22 09:29 ./fmwconfig/components/OHS/ohs1/moduleconf/ssl_bip.conf
-rw-r----- 1 oracle oinstall 3972 Aug 22 09:26 ./fmwconfig/components/OHS/ohs1/moduleconf/ssl_bip.conf.2019_08_22_09_26_45
-rw-r----- 1 oracle oinstall 3972 Aug 22 09:29 ./fmwconfig/components/OHS/ohs1/moduleconf/ssl_bip.conf.2019_08_22_09_29_50
-rw-r----- 1 oracle oinstall 3981 Aug 22 09:29 ./fmwconfig/components/OHS/ohs1/moduleconf/ssl_bip.conf.tmp
-rw-r----- 1 oracle oinstall 2105 Aug 22 09:29 ./fmwconfig/components/OHS/ohs1/mod_wl_ohs.conf
-rw-r----- 1 oracle oinstall 2105 Aug 22 09:26 ./fmwconfig/components/OHS/ohs1/mod_wl_ohs.conf.2019_08_22_09_26_29
-rw-r----- 1 oracle oinstall 2008 Aug 22 09:26 ./fmwconfig/components/OHS/ohs1/mod_wl_ohs.conf.2019_08_22_09_26_45
-rw-r----- 1 oracle oinstall 2105 Aug 22 09:29 ./fmwconfig/components/OHS/ohs1/mod_wl_ohs.conf.2019_08_22_09_29_38
-rw-r----- 1 oracle oinstall 2008 Aug 22 09:29 ./fmwconfig/components/OHS/ohs1/mod_wl_ohs.conf.2019_08_22_09_29_50
-rw-r--r-- 1 oracle oinstall 2105 Aug 22 09:29 ./fmwconfig/components/OHS/ohs1/mod_wl_ohs.conf.emctl_secure
-rw-r----- 1 oracle oinstall 3679 Aug 22 09:29 ./fmwconfig/components/OHS/ohs1/ssl.conf
-rw-r----- 1 oracle oinstall 3682 Aug 22 09:26 ./fmwconfig/components/OHS/ohs1/ssl.conf.2019_08_22_09_26_44
-rw-r----- 1 oracle oinstall 3679 Aug 22 09:26 ./fmwconfig/components/OHS/ohs1/ssl.conf.2019_08_22_09_26_45
-rw-r----- 1 oracle oinstall 3679 Aug 22 09:29 ./fmwconfig/components/OHS/ohs1/ssl.conf.2019_08_22_09_29_49
-rw-r----- 1 oracle oinstall 3679 Aug 22 09:29 ./fmwconfig/components/OHS/ohs1/ssl.conf.2019_08_22_09_29_50
-rw-r--r-- 1 oracle oinstall 3679 Aug 22 09:29 ./fmwconfig/components/OHS/ohs1/ssl.conf.emctl_secure
-rw-r----- 1 oracle oinstall 9513 Aug 22 09:33 ./fmwconfig/ovd/default/adapters.os_xml
-rw-r----- 1 oracle oinstall 3184 Aug 22 09:33 ./fmwconfig/ovd/default/server.os_xml
-rw-r----- 1 oracle oinstall 117 Aug 22 09:32 ./fmwconfig/servers/BIP/loggers.exclude
-rw-r----- 1 oracle oinstall 117 Aug 22 09:32 ./fmwconfig/servers/EMGC_ADMINSERVER/loggers.exclude
-rw-r----- 1 oracle oinstall 117 Aug 22 09:32 ./fmwconfig/servers/EMGC_OMS1/loggers.exclude

Sure enough, the following files were modified, and the following file was the culprit:

[oracle@enkpoemac1 config]$ grep Cipher ./fmwconfig/components/OHS/instances/ohs1/moduleconf/httpd_em.conf
#SSLCipherSuite HIGH
SSLCipherSuite SSL_RSA_WITH_RC4_128_MD5,SSL_RSA_WITH_RC4_128_SHA,SSL_RSA_WITH_AES_128_CBC_SHA,SSL_RSA_WITH_AES_256_CBC_SHA,RSA_WITH_AES_128_CBC_SHA256,RSA_WITH_AES_256_CBC_SHA256,RSA_WITH_AES_128_GCM_SHA256,RSA_WITH_AES_256_GCM_SHA384,ECDHE_ECDSA_WITH_AES_128_CBC_SHA,ECDHE_ECDSA_WITH_AES_256_CBC_SHA,ECDHE_ECDSA_WITH_AES_128_CBC_SHA256,ECDHE_ECDSA_WITH_AES_256_CBC_SHA384,ECDHE_ECDSA_WITH_AES_128_GCM_SHA256,ECDHE_ECDSA_WITH_AES_256_GCM_SHA384,ECDHE_RSA_WITH_RC4_128_SHA,ECDHE_RSA_WITH_3DES_EDE_CBC_SHA,ECDHE_RSA_WITH_AES_128_CBC_SHA,ECDHE_RSA_WITH_AES_256_CBC_SHA

[oracle@enkpoemac1 config]$ ls -al ./fmwconfig/components/OHS/instances/ohs1/moduleconf/httpd_em.conf
-rw-r--r-- 1 oracle oinstall 5659 Aug 22 09:29 ./fmwconfig/components/OHS/instances/ohs1/moduleconf/httpd_em.conf

I corrected the httpd_em.cfg file to include the following setting for SSL ciphers:

SSLCipherSuite ECDHE_RSA_WITH_AES_128_CBC_SHA,ECDHE_RSA_WITH_AES_256_CBC_SHA,RSA_WITH_AES_128_CBC_SHA256,RSA_WITH_AES_256_CBC_SHA256

It appears as though the SSLCipherSuite option was reset back to the OEM default after running the “emctl secure oms” command. I modified the SSLCipherSuite entry again with the correct list, and restarted the OMS. We ran another check with nmap, and it now is back to the state of only negotiating with approved SSL ciphers:

Andys-MacBook-Pro-3:~ root# nmap -sV –script ssl-enum-ciphers -p 4903 enkpoemac1
Password: