In zLinux environment, the DB2 High Availability solution using Active / Passive (with shared database) is very easy to implement as per the previous article.

The db2haicu command uses network IP address as the tie-breaker but the better reliable approach is to use a disk based tie-breaker. You will notice 2 advantages. 1. It is reliable compared to an IP address which is generally the gateway address. 2. If by any chance, the gateway address changes and the system admin / network guys did not inform DBAs, you could be in a situation when a faliover will actually fail. You are better protected with a disk based tie-breaker.

However, there are restrictions in using a disk based tie-breaker. The SAN that you are using must support SCSI-3 PR reservation. You can consult my previous article on how to check if your disk supports SCSI-3 PR based SCSI reservation or not.

If your SAN does not support SCSI-3 reservation, ask your vendor and they may provide a new driver or do something in their storage to support this. Ultimately you should be able to use the script (Linux only) given here.

If you are working in zLinux and have set up 2 active / passive machines with the shared database using db2haicu, you have network based tie-breaker and we will explore how could we use disk based tie-breaker.

In a regular SAN, you will get SCSI based disk but in zLinux through zVM, you will get ECKD disks (and also SCSI if your storage people configured that way).

Assuming that you are going to use ECKD disk for the tie-breaker, please help your zOS system administrators so that it is set up properly as sometime it may not be that obvious due to the active passive set up that we have done.

Consult Chapter 11 of IBM Tivoli SA MP guide

The relevant section is reproduced below from the Chapter-11.

The System Automation domain spans two z/VM systems

If the nodes of the System Automation domain are guests of two different z/VM systems, the following definitions are required for the ECKD dasd:

  • The tie breaker disk needs to be defined as a DEVNO disk in a minidisk statement in the user profile (no minidisk, no fullpack minidisk, no dedicated or attached dasd)
  • The ECKD disk (DEVNO) is shared between both nodes
  • The ECKD dasd must not be system attached when the z/VM is IPL’ed Logging on to the Linux guests shows the following device attachment, a virtual device (291 in our example) with the real address (4a82 in our example). The device becomes shared in our example with the command cp set shared on 4a82. Note that this needs to be done on both sides.
00:  CP  Q  4A82
00:  DASD  4A82  CP  SYSTEM  DEVNO     1         SHARED
00:
00:  CP  Q  V  291
00:  DASD  0291  3390  VM4A82  R/W 3339  CYL  ON  DASD  4A82  SUBCHANNEL  =  000F

In case one of the z/VM systems is shut down, the ECKD dasd is reserved by the surviving Linux guest on the other z/VM system. On the surviving side, you can see the following:

00:  CP  Q  DA  RESERVE
00:  DASD  4A82  CP  SYSTEM  DEVNO       1         RESERVED  BY  USER  test1

After starting the z/VM again, the dasd 4A82 is still offline and cannot be set online because it is still reserved by the other system.

Note:  A vary on of this device will not be successful at this time; instead, a timeout of 20 – 30 minutes occurs!

The recommendation is to start Linux on the restarted z/VM without the tie breaker dasd. This will succeed, since the dasd is not needed for starting Linux. After Linux is started, System Automation will start automatically on the Linux guest, and then Linux will automatically join the System Automaton domain again. At this time, the reservation of the ECKD dasd will be released. From now on, it is possible to vary on the device of the tie breaker disk (4a82 in our example), then to commit the share command and link the virtual address of the tie breaker disk (291 in our example) on the newly IPL'ed system. After a chccwdev -e 291 on the restarted Linux, everything should be up and running. No further interaction on the surviving Linux is necessary.

All the commands necessary are CP commands. Therefore, a script that issues those commands using VMCP can be written to automate the restoration of the failed Linux.

For the above example, the script might contain the following commands:

vmcp  vary  on  4a82
vmcp  set  shared  on  4a82
vmcp  link  *  291  291  mr ---> (change this mr to mwv)
chccwdev  -e  291

Important: The guide has mr in the link command and when we changed that to mwv, the reservation worked for us. This was determined by the z/OS system administrator.

Once a ECKD disk is visible and attached to both the systems, it is time to check if the disk supports the reservation or not before we attempt to use this disk as a tie-breaker disk.

Try to lock the disk

# /usr/sbin/rsct/bin/tb_break -v -l -t ECKD "ID=291"
Initializing ECKD tie-breaker (ID=291)
Kernel release=2.6.18-164.el5 maj_ver=2.6
search_one_file_with_pattern -- returns /sys/bus/ccw/drivers/dasd-eckd/0.0.0703/block:dasdd/dev, nmatch=1
ECKD devfile = (/sys/bus/ccw/drivers/dasd-eckd/0.0.0703/block:dasdd/dev)
read(/sys/bus/ccw/drivers/dasd-eckd/0.0.0703/availability) - buffer:good
sys_eckd: in.devnum=0703, devno=0xffffffff
major=94 minor=12 statbuf=(good) status=1
make_rsct_eckd_tb_device(/dev/rsct-tb-test, 94,12, dev=0x00005e0c) is successful
search_one_file_with_pattern -- returns /sys/bus/ccw/drivers/dasd-eckd/0.0.0703/block:dasdd/dev, nmatch=1
ECKD devfile = (/sys/bus/ccw/drivers/dasd-eckd/0.0.0703/block:dasdd/dev)
read(/sys/bus/ccw/drivers/dasd-eckd/0.0.0703/availability) - buffer:good
sys_eckd: in.devnum=0703, devno=0xffffffff
major=94 minor=12 statbuf=(good) status=1
eckd_release is successful
Reserving tie-breaker (ID=703)
search_one_file_with_pattern -- returns /sys/bus/ccw/drivers/dasd-eckd/0.0.0703/block:dasdd/dev, nmatch=1
ECKD devfile = (/sys/bus/ccw/drivers/dasd-eckd/0.0.0703/block:dasdd/dev)
read(/sys/bus/ccw/drivers/dasd-eckd/0.0.0703/availability) - buffer:good
sys_eckd: in.devnum=0703, devno=0xffffffff
major=94 minor=12 statbuf=(good) status=1
eckd_reserve: find eckd devnum=0x0703, major=94 minor=12, status=1
make_rsct_eckd_tb_device(/dev/rsct-tb-test, 94,12, dev=0x00005e0c) is successful
ioctl(fd=3, BIODASDRSRV) returns errno=0
eckd_reserve(BIODASDRSRV) is granted
tb_reserve status GRANTED(0) (errno=0)

Try to unlock the disk

# /usr/sbin/rsct/bin/tb_break -v -u -t ECKD "ID=291"
Initializing ECKD tie-breaker (ID=291)
Kernel release=2.6.18-164.el5 maj_ver=2.6
search_one_file_with_pattern -- returns /sys/bus/ccw/drivers/dasd-eckd/0.0.0703/block:dasdd/dev, nmatch=1
ECKD devfile = (/sys/bus/ccw/drivers/dasd-eckd/0.0.0703/block:dasdd/dev)
read(/sys/bus/ccw/drivers/dasd-eckd/0.0.0703/availability) - buffer:good
sys_eckd: in.devnum=0703, devno=0xffffffff
major=94 minor=12 statbuf=(good) status=1
make_rsct_eckd_tb_device(/dev/rsct-tb-test, 94,12, dev=0x00005e0c) is successful
search_one_file_with_pattern -- returns /sys/bus/ccw/drivers/dasd-eckd/0.0.0703/block:dasdd/dev, nmatch=1
ECKD devfile = (/sys/bus/ccw/drivers/dasd-eckd/0.0.0703/block:dasdd/dev)
read(/sys/bus/ccw/drivers/dasd-eckd/0.0.0703/availability) - buffer:good
sys_eckd: in.devnum=0703, devno=0xffffffff
major=94 minor=12 statbuf=(good) status=1
eckd_release is successful
Releasing tie-breaker (ID=703)
search_one_file_with_pattern -- returns /sys/bus/ccw/drivers/dasd-eckd/0.0.0703/block:dasdd/dev, nmatch=1
ECKD devfile = (/sys/bus/ccw/drivers/dasd-eckd/0.0.0703/block:dasdd/dev)
read(/sys/bus/ccw/drivers/dasd-eckd/0.0.0703/availability) - buffer:good
sys_eckd: in.devnum=0703, devno=0xffffffff
major=94 minor=12 statbuf=(good) status=1
eckd_release is successful
tb_release status 0 (errno=0)

Make sure that you have the following statement in your login profile as root before you attempt the following RSCT commands.

export CT_MANAGEMENT_SCOPE=2

It is a good idea to check # echo CT_MANAGEMENT_SCOPE to confirm if this is indeed set or not.

Once you have determined that the ECKD disk is indeed supporting reserve / unreserve commands, it is time to now use this disk as a tie-breaker for our Active / Passive DB2 configuration with the shared database.

Use command lssam to see the status of the TSA resources used in our clustering of the DB2 nodes.

Use # lsrsrc –c IBM.PeerNode and notice the name of the active tie-breaker used for the peer domain. Check the value OpQuorumTieBreaker and it is set to the network tie-breaker which was set by the db2haicu command when setting up active / passive cluster.

Use command # mkrsrc IBM.TieBreaker Name="eckd" Type=ECKD "DEVICE=/dev/dasdd" HearbeatPeriod=30 to create a tie-breaker resource.

Please note the device name should be the one that you see with the lsdasd command and use the device that is matching the DEV NUM.

Creating a Tie Breaker resource does not mean anything if it is not activated. Use command # chrsrc -c IBM.PeerNode OpQuorumTieBreaker="eckd"

Again, run command to see if this has taken effect or not by looking at the OpQuorimTieBreaker attribute of the IBM.PeerNode. The command is # lsrsrc –c IBM.PeerNode 

If you want to delete a Tie-Breaker resource, you cannot delete it if this is active and confirm that by using # lsrsrc –c IBM.PeerNode command. In order to delete a tie-breaker, you must first switch the tie-breaker and then delete it. There are 2 default tie-breaker and one of them is Operator. Follow these commands.

# chrsrc -c IBM.PeerNode OpQuorumTieBreaker="Operator"

# rmrsrc -s "Name == 'eckd'" IBM.TieBreaker

You will do above ONLY when if you need to delete a tie-breaker for any reason such as doing the exercise all over again.