19c Grid Infrastructure Upgrade Failures with OEDACLI

By | May 31, 2020

As part of an ongoing project, I've been performing a fair amount of upgrades to 19c on Exadata systems.  Several of those systems are virtualized, running Oracle VM (based on Xen).  I've previously mentioned Oracle's oedacli tool that can be used to make upgrades easier, and it's been very useful once you are familiar with it.

Grid Infrastructure upgrades with oedacli are broken in to several tasks, which can be executed separately - this gives you flexibility to perform the tasks that actually bounce the cluster at a specified time.  The three steps of an upgrade with oedacli are:

  1. ADD_HOME - validates the system for use with 19c, unpacks gold image files, reconfigures guest config files, and mounts new home on guests.
  2. CONFIG_HOME - runs gridSetup.sh to configure the new home
  3. RUN_ROOTSCRIPT - executes rootupgrade.sh on each node and runs config tools script

We began running oedacli to perform the upgrade on our clusters using the April 2020 OEDA release.  Step 1 completed without any issues, and step 2 failed almost immediately with the following error:

oedacli> deploy actions
Deploying Action ID : 2 UPGRADE CLUSTER GIVERSION=19.6.0.0.200114 GIHOMELOC=/u01/app/19.0.0.0/grid WHERE CLUSTERNAME=exa1v1 STEPNAME=CONFIG_HOME
Deploying UPGRADE CLUSTER
Upgrading Cluster
Configuring new clusterware home at /u01/app/19.0.0.0/grid
Running Cluster Verification Utility for upgrade readiness..
Relinking binaries with RDS /u01/app/19.0.0.0/grid
ERROR:
Command: ORACLE_HOME=/u01/app/19.0.0.0/grid; export ORACLE_HOME;cd /u01/app/19.0.0.0/grid/rdbms/lib;make -f ins_rdbms.mk rac_on;make -f ins_rdbms.mk ikfod;make -f ins_rdbms.mk ipc_rds ioracle ORACLE_HOME=/u01/app/19.0.0.0/grid; produced null output exa1db01v1.example.com with exit status 2
bash: line 0: cd: /u01/app/19.0.0.0/grid/rdbms/lib: Permission denied
make: ins_rdbms.mk: No such file or directory
make: *** No rule to make target `ins_rdbms.mk'. Stop.
make: ins_rdbms.mk: No such file or directory
make: *** No rule to make target `ins_rdbms.mk'. Stop.
make: ins_rdbms.mk: No such file or directory
make: *** No rule to make target `ins_rdbms.mk'. Stop.

Well, that's not great.  The good news is that it is pretty obvious to be a permissions error.  I logged in to the VM, and attempted to get in to test running the relink command, and got the same error:

[oracle@exa1db01v1 ~]$ cd /u01/app/19.0.0.0/grid
-bash: cd: /u01/app/19.0.0.0/grid: Permission denied

We can definitely see a permissions issue here.  If I go back as root and check the directory, I can see that the files are there with proper ownership and access:

[root@enkx4db01 ~]# ls -al /u01/app/19.0.0.0/grid
total 312
drwxr-xr-x 65 oracle oinstall 4096 May 31 15:10 .
drwxr-xr-x 10 oracle oinstall 4096 May 31 15:09 ..
drwxr-xr-x 2 oracle oinstall 4096 Apr 18 2019 addnode
drwxr-xr-x 10 oracle oinstall 4096 Apr 17 2019 assistants
drwxr-xr-x 2 oracle oinstall 12288 Apr 18 2019 bin
drwxr-xr-x 3 oracle oinstall 4096 Apr 17 2019 cha
drwxr-xr-x 4 oracle oinstall 4096 Apr 18 2019 clone
drwxr-xr-x 10 oracle oinstall 4096 Apr 18 2019 crs
drwxr-xr-x 3 oracle oinstall 4096 Apr 17 2019 css
drwxr-xr-x 7 oracle oinstall 4096 Apr 17 2019 cv
drwxr-xr-x 3 oracle oinstall 4096 Apr 17 2019 dbjava
drwxr-xr-x 2 oracle oinstall 4096 Apr 17 2019 dbs
drwxr-xr-x 5 oracle oinstall 4096 Apr 18 2019 deinstall
drwxr-xr-x 3 oracle oinstall 4096 Apr 17 2019 demo
drwxr-xr-x 3 oracle oinstall 4096 Apr 17 2019 diagnostics
drwxr-xr-x 13 oracle oinstall 4096 Apr 17 2019 dmu
-rw-r--r-- 1 oracle oinstall 852 Aug 18 2015 env.ora
drwxr-xr-x 6 oracle oinstall 4096 Apr 17 2019 evm
drwxr-xr-x 5 oracle oinstall 4096 Apr 17 2019 gpnp
-rwxr-x--- 1 oracle oinstall 3294 Mar 8 2017 gridSetup.sh
drwxr-xr-x 4 oracle oinstall 4096 Apr 17 2019 has
drwxr-xr-x 3 oracle oinstall 4096 Apr 17 2019 hs
drwxr-xr-x 10 oracle oinstall 4096 Apr 18 2019 install
drwxr-xr-x 2 oracle oinstall 4096 Apr 17 2019 instantclient
drwxr-x--- 13 oracle oinstall 4096 Apr 18 2019 inventory
drwxr-xr-x 8 oracle oinstall 4096 Apr 18 2019 javavm
drwxr-xr-x 3 oracle oinstall 4096 Apr 17 2019 jdbc
drwxr-xr-x 6 oracle oinstall 4096 Apr 18 2019 jdk
drwxr-xr-x 2 oracle oinstall 4096 Apr 17 2019 jlib
drwxr-xr-x 10 oracle oinstall 4096 Apr 17 2019 ldap
drwxr-xr-x 4 oracle oinstall 16384 Apr 18 2019 lib
drwxr-xr-x 5 oracle oinstall 4096 Apr 17 2019 md
drwxr-xr-x 10 oracle oinstall 4096 Apr 17 2019 network
drwxr-xr-x 5 oracle oinstall 4096 Apr 17 2019 nls
drwxr-x--- 14 oracle oinstall 4096 Apr 12 2019 OPatch
drwxr-xr-x 3 oracle oinstall 4096 Apr 18 2019 .opatchauto_storage
drwxr-xr-x 8 oracle oinstall 4096 Apr 17 2019 opmn
drwxr-xr-x 4 oracle oinstall 4096 Apr 17 2019 oracore
drwxr-xr-x 6 oracle oinstall 4096 Apr 17 2019 ord
drwxr-xr-x 4 oracle oinstall 4096 Apr 17 2019 ords
drwxr-xr-x 3 oracle oinstall 4096 Apr 17 2019 oss
drwxr-xr-x 8 oracle oinstall 4096 Apr 18 2019 oui
drwxr-xr-x 4 oracle oinstall 4096 Apr 17 2019 owm
drwxr-xr-x 7 oracle oinstall 4096 Apr 18 2019 .patch_storage
drwxr-xr-x 5 oracle oinstall 4096 Apr 17 2019 perl
drwxr-xr-x 6 oracle oinstall 4096 Apr 17 2019 plsql
drwxr-xr-x 5 oracle oinstall 4096 Apr 17 2019 precomp
drwxr-xr-x 2 oracle oinstall 4096 Apr 17 2019 QOpatch
drwxr-xr-x 5 oracle oinstall 4096 Apr 17 2019 qos
drwxr-xr-x 5 oracle oinstall 4096 Apr 17 2019 racg
drwxr-xr-x 13 oracle oinstall 4096 Apr 18 2019 rdbms
drwxr-xr-x 3 oracle oinstall 4096 Apr 17 2019 relnotes
drwxr-xr-x 7 oracle oinstall 4096 Apr 17 2019 rhp
-rwx------ 1 oracle oinstall 405 Apr 18 2019 root.sh
-rwx------ 1 oracle oinstall 490 Apr 17 2019 root.sh.old
-rw-r----- 1 oracle oinstall 10 Apr 17 2019 root.sh.old.1
-rwx------ 1 oracle oinstall 414 Apr 18 2019 rootupgrade.sh
-rwxr-x--- 1 oracle oinstall 628 Sep 3 2015 runcluvfy.sh
drwxr-xr-x 5 oracle oinstall 4096 Apr 17 2019 sdk
drwxr-xr-x 3 oracle oinstall 4096 Apr 17 2019 slax
drwxr-xr-x 4 oracle oinstall 4096 Apr 18 2019 sqlpatch
drwxr-xr-x 6 oracle oinstall 4096 Apr 18 2019 sqlplus
drwxr-xr-x 6 oracle oinstall 4096 Apr 17 2019 srvm
drwxr-xr-x 5 oracle oinstall 4096 Apr 17 2019 suptools
drwxr-xr-x 4 oracle oinstall 4096 Apr 17 2019 tomcat
drwxr-xr-x 3 oracle oinstall 4096 Apr 17 2019 ucp
drwxr-xr-x 7 oracle oinstall 4096 Apr 17 2019 usm
drwxr-xr-x 2 oracle oinstall 4096 Apr 17 2019 utl
-rw-r----- 1 oracle oinstall 500 Feb 6 2013 welcome.html
drwxr-xr-x 3 oracle oinstall 4096 Apr 17 2019 wlm
drwxr-xr-x 3 oracle oinstall 4096 Apr 17 2019 wwg
drwxr-xr-x 5 oracle oinstall 4096 Apr 17 2019 xag
drwxr-x--- 6 oracle oinstall 4096 Apr 17 2019 xdk

The file permissions are as I would expect (oracle:oinstall) because rootupgrade.sh hasn't been run to change any permissions yet.  I checked the log file, and could see that oedacli had logged in to the VMs as root and run a "/bin/chown -R oracle:oinstall /u01/app/19.0.0.0/grid"

[ RunCommand:218] Node exa1db01v1.example.com appears to be okay, going to run command /bin/chown -R oracle:oinstall /u01/app/19.0.0.0/grid
[ RunCommand:531] ##EXEC## |/bin/chown -R oracle:oinstall /u01/app/19.0.0.0/grid|exa1db01v1.example.com|root|
[ RunCommand:329] ##RUNC## |/bin/chown -R oracle:oinstall /u01/app/19.0.0.0/grid|exa1db01v1.example.com|root| New or not cached
[ RunCommand:183] Ran commands, elapsed time = 2002 mS
[ KommandOutput:104] ====== Output from node exa1db01v1.example.com ======
[ KommandOutput:106] Command = exa1db01v1.example.com | root | /bin/chown -R oracle:oinstall /u01/app/19.0.0.0/grid
[ KommandOutput:108] Ret code = <0> from node exa1db01v1.example.com
[ KommandOutput:113] ## Output Start
[ EsCommonUtils:1368] Command: /bin/chown -R oracle:oinstall /u01/app/19.0.0.0/grid produced null output but executed successfully on exa1db01v1.example.com
[ KommandOutput:116] ====== End Output from node exa1db01v1.example.com Ret code0 ======

If that's the case, why can't the oracle user run the relink?  The issue lies one directory up from the actual GI home.  The ownership for /u01/app/19.0.0.0 is set to root:root, and the permissions were 750:

[root@exa1db01v1 ~]# ls -al /u01/app/
total 72
drwxr-xr-x 11 root oinstall 4096 May 22 19:22 .
drwxr-xr-x 7 root oinstall 4096 Nov 16 2018 ..
drwxr-xr-x 3 root oinstall 4096 Aug 9 2018 12.2.0.1
drwxr-x--- 3 root root 4096 May 9 19:22 19.0.0.0 <---------permissions don't allow Oracle to access directory
drwxrwxr-x 17 oracle oinstall 4096 Jan 13 10:55 oracle
drwxrwx--- 6 oracle oinstall 4096 May 21 23:53 oraInventory

[root@exa1db01v1 ~]# ls -al /u01/app/19.0.0.0/
total 12
drwxr-x--- 3 root root 4096 May 9 19:22 .
drwxr-xr-x 11 root oinstall 4096 May 9 19:22 ..
drwxr-xr-x 71 root root 4096 May 9 19:21 grid <---------permissions are ok

On this system, the umask for the root user is set to 027, rather than the default of 022. These types of changes are fairly common on systems that require hardening, particularly systems that use the Exadata STIG scripts (MOS note #2181944.1) for hardening.

When the GI home directory was created, the command used was "mkdir -p /u01/app/19.0.0.0/grid," which created the parent directory as well.  The permissions followed the umask of the root user, which prevent non-root users from having access.

After identifying the issue, running a simple "chmod 755 /u01/app/19.0.0.0" fixed the issue, and we were able to restart the GI upgrade with the CONFIG_HOME step.

Leave a Reply

Your email address will not be published.