Useful commands for day to day work to manage DB2 pureScale with RSCT and GPFS
GPFS:
Did you wonder why GPFS commands start with mm? The GPFS started as a IBM research project the the early 90s to build a multi-media (music and video) networked file system for some university and that is why our Ph.D researchers chose to prefix each file system command with mm (multi-media) and that legacy continues till today.
Look for the latest log in /var/adm/ras/mmfs.log.latest
You can see entries like when a node was evicted from the cluster and will tell you why you are not seeing the mount points on a node even though it is online. Once a node has been evicted, look for the clues in this file.
Check the status of GPFS (General Parallel File System) mount on all nodes in the GPFS cluster and determine which node does not show a file system online.
# mmlsnsd -M
Disk name NSD volume ID Device Node name Remarks --------------------------------------------------------------------------------------- gpfs1nsd C0A88E664FE7F0D7 /dev/sdc node02.purescale.ibm.local gpfs1nsd C0A88E664FE7F0D7 /dev/sdc node03.purescale.ibm.local gpfs1nsd C0A88E664FE7F0D7 /dev/sdc node04.purescale.ibm.local gpfs2nsd C0A88E664FE7F468 /dev/sdb node02.purescale.ibm.local gpfs2nsd C0A88E664FE7F468 /dev/sdb node03.purescale.ibm.local gpfs2nsd C0A88E664FE7F468 /dev/sdb node04.purescale.ibm.local gpfs3nsd C0A88E664FE7F485 /dev/sdd node02.purescale.ibm.local gpfs3nsd C0A88E664FE7F485 /dev/sdd node03.purescale.ibm.local gpfs3nsd C0A88E664FE7F485 /dev/sdd node04.purescale.ibm.local gpfs4nsd C0A88E664FE7F4B2 /dev/sde node02.purescale.ibm.local gpfs4nsd C0A88E664FE7F4B2 /dev/sde node03.purescale.ibm.local gpfs4nsd C0A88E664FE7F4B2 /dev/sde node04.purescale.ibm.local gpfs5nsd C0A88E664FE7F4DB /dev/sdf node02.purescale.ibm.local gpfs5nsd C0A88E664FE7F4DB /dev/sdf node03.purescale.ibm.local gpfs5nsd C0A88E664FE7F4DB /dev/sdf node04.purescale.ibm.local gpfs6nsd C0A88E664FE7F4FB /dev/sdg node02.purescale.ibm.local gpfs6nsd C0A88E664FE7F4FB /dev/sdg node03.purescale.ibm.local gpfs6nsd C0A88E664FE7F4FB /dev/sdg node04.purescale.ibm.local
Check the status of GPFS on the node the command was run
# mmlsnsd -X Disk name NSD volume ID Device Devtype Node name Remarks --------------------------------------------------------------------------------------------------- gpfs1nsd C0A88E664FE7F0D7 /dev/sdc generic node02.purescale.ibm.local gpfs2nsd C0A88E664FE7F468 /dev/sdb generic node02.purescale.ibm.local gpfs3nsd C0A88E664FE7F485 /dev/sdd generic node02.purescale.ibm.local gpfs4nsd C0A88E664FE7F4B2 /dev/sde generic node02.purescale.ibm.local gpfs5nsd C0A88E664FE7F4DB /dev/sdf generic node02.purescale.ibm.local gpfs6nsd C0A88E664FE7F4FB /dev/sdg generic node02.purescale.ibm.local
If SCSI-3 PR is enabled, you will see an additional message pr=yes in the remarks column. When you install DB2 pureScale for the first time and depending upon storage, the GPFS can be made aware about the SCSI-3 PR capability and you should see the pr=yes in the remark column. If you do not and know that you have SCSI-3 PR capable storage, you should use the manual procedure outlined in the Information Center to enable this. The link is here.
The procedure is outlined as:
-
$ db2stop force $ db2stop instance on <hostname> ---> Repeat this for all hosts
-
# db2cluster -cm -stop -domain <name>
—> Find domain name by using lsrpdomain command. This will shut RSCT
-
# db2cluster -cfs -stop -all
—> This will bring nodes down but GPFS domain is still active.
-
# touch /var/mmfs/etc/prcapdevices
-
# /usr/lpp/mmfs/bin/tsprinquiry >> /var/mmfs/etc/prcapdevices
-
# /usr/lpp/mmfs/bin/mmchconfig usePersistentReserve=yes
-
# scp /var/mmfs/etc/prcapdevices <nodename>:/var/mmfs/etc/prcapdevices
–> Repeat this for all nodes
-
# /usr/lpp/mmfs/bin/mmchconfig totalPingTimeout=45
–> Reduce from default 75 to less now since we have PR
-
# db2cluster -cfs -start -all
-
# db2cluster -cm -start -domain <domainname>
-
# /usr/lpp/mmfs/bin/mmlsnsd -X
–> Check pr=yes set in the output in the Remarks column and check on all nodes.
-
# db2cluster -cm -list -hostfailuredetectiontime
–> The default is 8 seconds
-
# db2cluster -cm -set -hostfailuredetectiontime -value 4
–> Since PR is enabled, 4 seconds is good.
-
# db2cluster -cfs -verify -resources
-
# db2cluster -cm -verify -resources
- $ db2start instance on <hostname> –> Repeat for all nodes
-
$ db2start
To test if SCSI-3 PR is effective, do a node failover or pull the power chord or kill -11 on the PID of the GPFS daemon and then do the following check.
Rum mmfsadm command
# mmfsadm dump sgmgr Stripe groups managed by this node: (none)
If you see the above output, you then need to determine the GPFS manager node. Run mmlsmgr command.
# mmlsmgr file system manager node ---------------- ------------------ db2data2 192.168.142.103 (node03) db2data4 192.168.142.103 (node03) db2log 192.168.142.103 (node03) db2data1 192.168.142.104 (node04) db2data3 192.168.142.104 (node04) db2fs1 192.168.142.104 (node04) Cluster manager node: 192.168.142.103 (node03)
The GPFS cluster manager is node03. Run the mmfsadm dump sgmgr command from the GPFS cluster manager node. If you do not still the output, run it from other node
node02:~ # ssh node03 mmfsadm dump sgmgr Stripe groups managed by this node: "db2data2" id C0A88E68:4FE7F48A: status recovered, fsck not active seq 1662705587 mgrTakeover noTakeover mgrRestricted 0 asyncRecovery: needed 0 inProgress 0, onetimeRecoveryDone 1 dmDoDeferredDeletions dmnoDefDel pending operations 0 [ ], quiesce level -1, blocked 0 mgrOperationInProgress 0, logFileAssignmentInProgress 0 initialLogRecoveryCompleted 0, logMigrateWhat 0x00 FenceDone 0, aclGarbageCollectInProgress 0 resetEFOptions 0 mounts: 3 nodes: :1 :9 :9 multiTMWanted true multiTMCfgChange false panics: 0 nodes unfenced: 3 nodes: :0 :0 :0 log group 1, index 0, flags 0x00, replicas 1, status in use, user , migratePending 0 log group 2, index 1, flags 0x00, replicas 1, status in use, user , migratePending 0 ---- ---- ---- Node failure recovery statistics for last 2 failures: Completed at |Total sec nodes|TM recov|AllocMgr| Fencing nodes disks|Log recov logs| --------------------+--------- -----+--------+--------+-------- ----- -----+--------- ----| 2012-06-25@18:54:59 | 62.016 1| 0.007| 0.000| 62.001 1 1| 0.006 1| 2012-06-25@19:33:53 | 52.031 1| 0.014| 0.000| 52.000 1 1| 0.012 1| Totals for 2 since |Total sec nodes|TM recov|AllocMgr| Fencing nodes disks|Log recov logs| --------------------+--------- -----+--------+--------+-------- ----- -----+--------- ----| 2012-06-25@18:54:59 | 114.047 2| 0.021| 0.000| 114.001 2 2| 0.019 2|
Please notice the Fencing column and it shows my value 62 and 52 seconds when I did the failover test. This is due to the fact that I am running pureScale in a VM environment in my laptop and I do not have SCSI-3 PR on my storage which is the laptop hard drive. But in a real cluster, you will see this value to 0.5 to 1-2 seconds. Including DB2 recovery time, you will see the applications start processing transactions again in 5-10 seconds. This is possible due to the fast disk I/O fencing due to SCSI-3 PR capable storage.
Please see my previous articles on how to use the script that I attached to test if the storage supports SCSI-3 PR capability.
Just remember this:
You need Type 7 PR (Persistent Reserve) for GPFS (Fast I/O Fencing) and Type 5 PR (Tie Breaker) for RSCT or Tivoli System Automation (TSA) or cluster manager (db2cluster -cm switch).
Type 7 PR is “Write exclusive, all registrants” – GPFS
Type -5 is “”
SCSI-3 PR uses a concept of registration and reservation. Each system (DB2 server) registers its key with a SCSI-3 device. Multiple DB2 servers registering keys form a membership and establish a reservation, typically set to “Write Exclusive All Registrants – Type 7”. This setting enables only registered systems to perform write operations. For a given disk, only one reservation can exist amidst numerous registrations. With SCSI-3 PR, blocking write access is as simple as removing a registration from a device and this process takes 3-20 seconds. Only registered DB2 members can eject the registration of another DB2 member.