DB2 pureScale Install Problem Determination

by | Jul 1, 2016

RSCT License Issue

$ db2start 128
SQL1677N  DB2START or DB2STOP processing failed due to a DB2 cluster services error

DATA #9 : SQLHA Remote Command Output, PD_TYPE_SQLHA_COMMAND_RESPONSE, 3508 bytes
commandResponse->callRC: 0x00000000
commandResponse->output: Error: Product license is invalid and needs to be upgraded.

2016-06-30-09.40.31.819767-240 I6015E554             LEVEL: Error
PID     : 18164                TID : 140258826409760 PROC : db2start
INSTANCE: db2psc               NODE : 000
HOSTNAME: purescale.zinox.com
FUNCTION: DB2 UDB, high avail services, sqlhaVerifyHostLicenses, probe:18163
MESSAGE : The cluster manager license for the host is not ok:
DATA #1 : String, 42 bytes
purescale.zinox.com
DATA #2 : SQLHA_LICENSE_STATUS, PD_TYPE_SQLHA_LICENSE_STATUS, 4 bytes
SQLHA_LICENSE_STATUS_EVALUATION_PERIOD_EXPIRED

You applied RSCT license using samlicm -i <sam32.lic or sam41.lic> but you still see above message in db2diag.log. Even though samlicm -i <license file> did not report any error. The license may still be invalid. This can occur due to variety of reasons not known to me. However, it is always a good idea to check if license applied is valid or not.

# samlicm -t
# echo $?

The first command tests the license if it is OK or not. The second output should show a value of 0, which means that the license is valid. If the return code is ‘1’, the license is invalid. Download the license file again from IBM Passport Advantage site and try it again.

For example:

# samlicm -t
# echo $?
1
# samlicm -i sam32.lic
# samlicm -t
# echo $?
0

Reload License

Applying a license does not mean that the running processes know about it. Either reboot the machine for license to pickup or kill IBM.ConfigRMd process without (-9) so that it will restart again. This may work or may not work as the critical resource protection method may get invoked and RSCT may reboot the server.

# ps -ef | grep -i config
root      1704  6398  0 09:52 pts/0    00:00:00 grep -i config
root      2106   992  0 09:36 ?        00:00:00 /usr/sbin/rsct/bin/IBM.ConfigRMd
# kill 2106

netmon.cf

If you specified entries in netmon.cf for the layer 2 network having an outside IP address, you must make sure that you are able to ping the IP address using the interface. For example:

# cd /var/ct/cfg
 # cat netmon.cf
 !IBQPORTONLY !ALL
 !REQD eth0 10.10.120.11
 !REQD eth1 192.168.120.11

Make sure that you are able to ping the IP addresses using the interface. If you do not get output from ping, you either have wrong interface name or IP address or something has changed since last good config like a NIC card was replaced and the interface name changed but IP address was same.

$ ping -I eth0 10.10.121.11
$ ping -I eth1 192.168.120.11

Even after correcting above problem, if you still see the message “Error: A reachable IP address could not be automatically determined” and the error message is asking to fix netmon.cf, chances are that you have duplicate adapter name assigned. For example, in /etc/sysconfig/network-scripts, you may have redundant ifcfg file which is not mapped to any adapter but has the same name. The simple fix is to remove the unwanted ifcfg file.

SSH Key has changed

When a machine gets rebuilt and backup restored, the SSH key may get changed and you will have that node not working or you will see the following messages in your db2diag.log file. Fix your SSH keys on all hosts and make sure that you are able to do ssh using localhost, IP address, FQDN and short name.

2016-06-30-08.33.53.397828-240 E2122E2289            LEVEL: Severe
PID     : 32035                TID : 140342334629664 PROC : db2cluster
INSTANCE: db2psc               NODE : 000
HOSTNAME: purescale.zinox.com
FUNCTION: DB2 UDB, high avail services, sqlhaExecuteCommandLocal, probe:1264
DATA #1 : String, 25 bytes
/var/db2/db2ssh/db2locssh
DATA #2 : String, 21 bytes
root@vpdb202 hostname
DATA #3 : signed integer, 8 bytes
6
DATA #4 : unsigned integer, 4 bytes
32047
DATA #5 : Boolean, 1 bytes
true
DATA #6 : unsigned integer, 8 bytes
853
DATA #7 : SQLHA Remote Command Output, PD_TYPE_SQLHA_COMMAND_RESPONSE, 3508 bytes
commandResponse->callRC: 0x00000000
commandResponse->output: @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@    WARNING: REMOTE HOST IDENTIFICATION HAS CHANGED!     @
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
IT IS POSSIBLE THAT SOMEONE IS DOING SOMETHING NASTY!
Someone could be eavesdropping on you right now (man-in-the-middle attack)!
It is also possible that the RSA host key has just been changed.
The fingerprint for the RSA key sent by the remote host is
c9:96:96:d1:3e:f5:e1:96:0f:b9:9b:64:43:89:0e:63.
Please contact your system administrator.
Add correct host key in /home/db2psc/.ssh/known_hosts to get rid of this message.
Offending key in /home/db2psc/.ssh/known_hosts:8
RSA host key for purescale.zinox.com has changed and you have requested strict checking.
Host key verification failed.
failure - examine the system log on the remotehost for additional information