I had documented same for zLinux here http://www.zinox.com/archives/182

When you use a tie-breaker disk in pureScale or in HADR, it must be Write Exclusive Registrants Only. It is a good idea to test if the tie-breaker disk is working as expected or not.

The tie-breaker dish must be shared between all hosts in pureScale and 2 hosts in HADR (Primary and Principal Standby). Check the physical volume ID returned by the lspv command to determine if the disk is shared between nodes.
Be aware, however, that AIX remembers all disks that have been attached to the system, and the disks listed by the lspv command may no longer be attached. If such a disk was moved to another machine, it might appear as if the disk is shared, when in fact it is no longer attached to the original machine.

# lspv –> should show the PVID of the tie-breaker disk and the PVID should be same when you type the command from all hosts. For Linux, you can use scsi_id command to find out the disk WWID number. The PVID is specific to AIX.

# lsdev -C -l hdiskX –> should return the type as SCSI disk drive. For Linux, you can use lsscsi / sg_info -l command to find out if this is SCSI disk or not.

# chdev -l <pv_name> -a PR_key_value=0x<unique_key> -a reserve_policy=PR_exclusive
<pv_name>
The name of the physical volume on the AIX system to be used for tie breaking.
<unique_key>
Is an arbitrary numeric key that is unique to each node in the cluster.
If you used pureScale – at the time of instance creation using db2icrt command, you can define the name of the tie-breaker disk and the instance creation process checks if the disk is a valid tie-breaker disk or not.
However, if you use command db2cluster -cm -set -tiebreaker -disk /dev/hdiskX command to set the tie-breaker, it will just set the disk without testing if this is a valid reserve-capable disk or not.

For the purpose of testing if a disk is reserve-capable or not, use these commands.

Reserve the Disk using -l option

From First Host:

root@node01> /usr/sbin/rsct/bin/tb_break –v -l -t DISK "DEVICE=/dev/hdiskX"

If you know PVID (output from lspv), you can also this syntax.

root@node01> /usr/sbin/rsct/bin/tb_break -v -l -t SCSIPR PVID=00c60bd7f291abab

On CentOS 6.7, the sample to obtain reservation:

root@node01> /usr/sbin/rsct/bin/tb_break -v -l -t SCSIPR "DEVICE=/dev/sdc"

Sample output
loadTieBreakerModule (/usr/sbin/rsct/modules/tblm_SCSIPR.so), loading symbol(tbInitModule)
loadTieBreakerModule (/usr/sbin/rsct/modules/tblm_SCSIPR.so), loading symbol(tblm_SCSIPR)
loadTieBreakerModule (/usr/sbin/rsct/modules/tblm_SCSIPR.so), invoking the init module
loadTieBreakerModule (/usr/sbin/rsct/modules/tblm_SCSIPR.so), registering the function
loadTieBreakerModule (/usr/sbin/rsct/modules/tblm_SCSIPR.so), returns RC=0
Initializing SCSIPR tie-breaker (DEVICE=/dev/sdc)
SCSIPR_init Entered
SCSIPR_init: DEVICE=/dev/sdc
SCSIPR_parse_and_obtain_scsi_devinfo begin
  token key:DEVICE val:/dev/sdc
find_sd_scsi_id_info(/dev/sdc) -> code=0, hostno=3, chlunid=0x200
DEVICE=/dev/sdc host=3 id=0 chan=0 lun=2
DeviceName(/dev/sdc) found. Devno=0x820 (major=8, minor=32)
find_device_for_given_devid Try - (pat=/dev/dm-*)
find_device_for_given_devid (devid=820) ==> devicename(), rc=2
find_mpio_slave_devices(/dev/sdc), N=0
Slave Devices (N=0):
SCSIPR_parse_and_obtain_scsi_devinfo end, rc=0
parse_find(sg_name=/dev/sdc) ==> rc=0, host=3 chan=0 id=0 lun=2, wwn=, wwid=
SCSIPR_init: sgdev=/dev/sdc initializing
execAndWaitCommand '/usr/sbin/rsct/bin/lsnodeid' ...
waitForProc rc=1850, status=0
Command(/usr/sbin/rsct/bin/lsnodeid) successful
Command(/usr/sbin/rsct/bin/lsnodeid) returns exitCode=0
execAndWaitCommand '/usr/sbin/rsct/bin/lsnodeid' returns exitCode=0
PR Key obtained from lsnodeid : 0x654fe4ce7bbf6944
SCSIPR_PRKEY: lsnodeid(rc=0): 0x654fe4ce7bbf6944
execAndWaitCommand '/usr/bin/sg_persist -d /dev/sdc --read-keys' ...
waitForProc rc=1851, status=0
Command(/usr/bin/sg_persist -d /dev/sdc --read-keys) successful
Command(/usr/bin/sg_persist -d /dev/sdc --read-keys) returns exitCode=0
execAndWaitCommand '/usr/bin/sg_persist -d /dev/sdc --read-keys' returns exitCode=0
SCSIPR_register (disk=/dev/sdc) key: '0x654fe4ce7bbf6944' is already registered
 SCSIPR_init: Leaving registered /dev/sdc, prkey=0x654fe4ce7bbf6944
Releasing tie-breaker (DEVICE=/dev/sdc)
execAndWaitCommand '/usr/bin/sg_persist -d /dev/sdc --read-keys' ...
waitForProc rc=1852, status=0
Command(/usr/bin/sg_persist -d /dev/sdc --read-keys) successful
Command(/usr/bin/sg_persist -d /dev/sdc --read-keys) returns exitCode=0
execAndWaitCommand '/usr/bin/sg_persist -d /dev/sdc --read-keys' returns exitCode=0
SCSIPR_register (disk=/dev/sdc) key: '0x654fe4ce7bbf6944' is already registered
execAndWaitCommand '/usr/bin/sg_persist -d /dev/sdc --out --release 
       --param-rk=0x654fe4ce7bbf6944 --prout-type=5' ...
waitForProc rc=1854, status=0
Command(/usr/bin/sg_persist -d /dev/sdc --out --release --param-rk=0x654fe4ce7bbf6944 
       --prout-type=5) successful
Command(/usr/bin/sg_persist -d /dev/sdc --out --release --param-rk=0x654fe4ce7bbf6944 
       --prout-type=5) returns exitCode=0
execAndWaitCommand '/usr/bin/sg_persist -d /dev/sdc --out --release 
       --param-rk=0x654fe4ce7bbf6944 --prout-type=5' returns exitCode=0
execAndWaitCommand '/usr/bin/sg_persist -d /dev/sdc --read-reservation --prout-type=5' ...
waitForProc rc=1856, status=0
Command(/usr/bin/sg_persist -d /dev/sdc --read-reservation --prout-type=5) successful
Command(/usr/bin/sg_persist -d /dev/sdc --read-reservation --prout-type=5) returns exitCode=0
execAndWaitCommand '/usr/bin/sg_persist -d /dev/sdc --read-reservation 
        --prout-type=5' returns exitCode=0
check_prkey_reserved(/usr/bin/sg_persist -d /dev/sdc --read-reservation --prout-type=5) output:
  IET       VIRTUAL-DISK  0001
  Peripheral device type: disk
  PR generation=0x3, there is NO reservation held
This node does not have the reservation, curkey=
 SCSIPR_release: device=/dev/sdc, prkey=0x654fe4ce7bbf6944, result=0
tb_release status 0 (errno=0)

 

From Second Host:

root@node02> /usr/sbin/rsct/bin/tb_break –v -l -t DISK "DEVICE=/dev/hdiskX"

Sample output
[root@node03 ~]# tb_break -v -l -t SCSIPR "DEVICE=/dev/sdc"
loadTieBreakerModule (/usr/sbin/rsct/modules/tblm_SCSIPR.so), loading symbol(tbInitModule)
loadTieBreakerModule (/usr/sbin/rsct/modules/tblm_SCSIPR.so), loading symbol(tblm_SCSIPR)
loadTieBreakerModule (/usr/sbin/rsct/modules/tblm_SCSIPR.so), invoking the init module
loadTieBreakerModule (/usr/sbin/rsct/modules/tblm_SCSIPR.so), registering the function
loadTieBreakerModule (/usr/sbin/rsct/modules/tblm_SCSIPR.so), returns RC=0
Initializing SCSIPR tie-breaker (DEVICE=/dev/sdc)
SCSIPR_init Entered
SCSIPR_init: DEVICE=/dev/sdc
SCSIPR_parse_and_obtain_scsi_devinfo begin
token key:DEVICE val:/dev/sdc
find_sd_scsi_id_info(/dev/sdc) -> code=0, hostno=3, chlunid=0x200
DEVICE=/dev/sdc host=3 id=0 chan=0 lun=2
DeviceName(/dev/sdc) found. Devno=0x820 (major=8, minor=32)
find_device_for_given_devid Try - (pat=/dev/dm-*)
find_device_for_given_devid (devid=820) ==> devicename(), rc=2
find_mpio_slave_devices(/dev/sdc), N=0
Slave Devices (N=0):
SCSIPR_parse_and_obtain_scsi_devinfo end, rc=0
parse_find(sg_name=/dev/sdc) ==> rc=0, host=3 chan=0 id=0 lun=2, wwn=, wwid=
SCSIPR_init: sgdev=/dev/sdc initializing
execAndWaitCommand '/usr/sbin/rsct/bin/lsnodeid' ...
waitForProc rc=28456, status=0
Command(/usr/sbin/rsct/bin/lsnodeid) successful
Command(/usr/sbin/rsct/bin/lsnodeid) returns exitCode=0
execAndWaitCommand '/usr/sbin/rsct/bin/lsnodeid' returns exitCode=0
PR Key obtained from lsnodeid : 0x37a1ed78b1691712
SCSIPR_PRKEY: lsnodeid(rc=0): 0x37a1ed78b1691712
execAndWaitCommand '/usr/bin/sg_persist -d /dev/sdc --read-keys' ...
waitForProc rc=28457, status=0
Command(/usr/bin/sg_persist -d /dev/sdc --read-keys) successful
Command(/usr/bin/sg_persist -d /dev/sdc --read-keys) returns exitCode=0
execAndWaitCommand '/usr/bin/sg_persist -d /dev/sdc --read-keys' returns exitCode=0
SCSIPR_register (disk=/dev/sdc) key: '0x37a1ed78b1691712' is already registered
SCSIPR_init: Leaving registered /dev/sdc, prkey=0x37a1ed78b1691712
Reserving tie-breaker (DEVICE=/dev/sdc)
execAndWaitCommand '/usr/bin/sg_persist -d /dev/sdc --read-keys' ...
waitForProc rc=28458, status=0
Command(/usr/bin/sg_persist -d /dev/sdc --read-keys) successful
Command(/usr/bin/sg_persist -d /dev/sdc --read-keys) returns exitCode=0
execAndWaitCommand '/usr/bin/sg_persist -d /dev/sdc --read-keys' returns exitCode=0
SCSIPR_register (disk=/dev/sdc) key: '0x37a1ed78b1691712' is already registered
execAndWaitCommand '/usr/bin/sg_persist -d /dev/sdc --out --reserve 
  --param-rk=0x37a1ed78b1691712 --prout-type=5' ...
waitForProc rc=28460, status=25344
Command (/usr/bin/sg_persist -d /dev/sdc --out --reserve --param-rk=0x37a1ed78b1691712 
   --prout-type=5) failed with exitcode=25344.
Stdout:  IET       VIRTUAL-DISK  0001
Peripheral device type: disk

StdErr:persistent reserve out: scsi status: Reservation Conflict
PR out: command failed
Command(/usr/bin/sg_persist -d /dev/sdc --out --reserve --param-rk=0x37a1ed78b1691712 
    --prout-type=5) returns exitCode=25344
execAndWaitCommand '/usr/bin/sg_persist -d /dev/sdc --out --reserve 
   --param-rk=0x37a1ed78b1691712 --prout-type=5' returns exitCode=25344
Show the currently reserved key:
execAndWaitCommand '/usr/bin/sg_persist -d /dev/sdc --read-reservation --prout-type=5' ...
waitForProc rc=28462, status=0
Command(/usr/bin/sg_persist -d /dev/sdc --read-reservation --prout-type=5) successful
Command(/usr/bin/sg_persist -d /dev/sdc --read-reservation --prout-type=5) returns exitCode=0
execAndWaitCommand '/usr/bin/sg_persist -d /dev/sdc --read-reservation --prout-type=5' returns exitCode=0
check_prkey_reserved(/usr/bin/sg_persist -d /dev/sdc --read-reservation --prout-type=5) output:
IET       VIRTUAL-DISK  0001
Peripheral device type: disk
PR generation=0x2, Reservation follows:
Key=0x654fe4ce7bbf6944
scope: LU_SCOPE,  type: Write Exclusive, registrants only
check_prkey_reserved CURKEY=0x654fe4ce7bbf6944
DeviceName(/dev/sdc) found. Devno=0x820 (major=8, minor=32)
find_device_for_given_devid Try - (pat=/dev/dm-*)
find_device_for_given_devid (devid=820) ==> devicename(), rc=2
find_mpio_slave_devices(/dev/sdc), N=0
Slave Devices (N=0):
SCSIPR_reserve: dev=/dev/sdc, prkey=0x37a1ed78b1691712, result=25344, status=1
tb_reserve status DENIED(1) (errno=2)

Note: If you are using Linux, you will use the following syntax.

/usr/sbin/rsct/bin/tb_break –v -l -t SCSIPR "WWID=6005076303ffc4d20000000000001172" –> Find out WWID from scsi_id command. You can also use WWN, if you know the WWN and ask your SAN admin to get this for you if you cannot find it. You can also use “ID=%d LUN=%d HOST=%d CHAN=%d” which you can find from lsscsi command.

You should receive error from the second host as the first host has already obtained the reservation. The error from the second host is a validation that you have the good tie-breaker disk which will work for pureScale or HADR.

Release the Disk using -u option

Now, you should be able to unlock (release) the disk from any node.

root@node01> /usr/sbin/rsct/bin/tb_break -v -u -t DISK "DEVICE=/dev/hdiskX"

Or, using PVID, you can also use:

root@node01> /usr/sbin/rsct/bin/tb_break -v -u -t SCSIPR PVID=00c60bd7f291abab

Conclusion: While the disk is reserved by one node, the reservation from other should fail but both the nodes are able to unlock the disk using -u switch.

Please make sure that you use proper diskX –> X is the number that you see on that host which could be different from other host. You keep a note of PVID (AIX) or WWN or WWID on Linux for the tie-breaker disk.

Use Tie-breaker disk in HADR

The db2haiuc command does not support disk based tie-breaker but you can use this simple command to use a disk based tie-breaker in HADR.

On AIX – if the tie-breaker disk is shown as /dev/hdisk2 on both nodes, then use this command:

root@node01> mkrsrc IBM.TieBreaker Name="tb" Type=DISK DeviceInfo="DEVICE=/dev/hdisk2" \
HeartbeatPeriod=30

If the tie-breaker disk shows as /dev/hdisk2 on node01 and /dev/hdisk3 on node02, then use this command:

root@node01> mkrsrc IBM.TieBreaker Name="tb" Type=DISK NodeInfo='{["node01", "DEVICE=/dev/hdisk2"], \
["node02", "DEVICE=/dev/hdisk3"]}' HeartbeatPeriod=30

On Linux, use Type=SCSI and use WWN or WWID or ID,LUN,HOST,CHAN in DeviceInfo. For example:

root@node01> mkrsrc IBM.TieBreaker Name="tb" Type=SCSI DeviceInfo="ID=0 LUN=1 HOST=4,CHAN=0" \
         HeartbeatPeriod=30

You can find info about ID, LUN, HOST and CHAN from lsscsi command output or you could also use scsi_id to find out WWID and ask your SAN admin for WWN. Their must be a way to find WWN at OS level but I haven’t prayed enough to google god.

Now, creating a RSCT resource for a tie-breaker is not enough. You have to enable it.

By default, there are two standard tie-breaker that comes pre-built – operator and fail. Operator means that a human being in coat and pant will come and decide the tie.

For example – if a human being decides to grant a quorum to node01, then the person will type this command as root on the node01.

root@node01> runact -c IBM.PeerDomain ResolveOpQuorumTie Ownership=1

if the same human being (Operator) decides to deny the quorum to node01, then the person will type this command as root on the node01.

root@node01> runact -c IBM.PeerDomain ResolveOpQuorumTie Ownership=0

Now, we do not want to involve human beings (They costs more), we then write the following command to use “tb” tie-breaker that we defined earlier.

root@node01> chrsrc -c IBM.PeerNode OpQuorumTieBreaker="tb"

How do you know, if indeed “tb” is being used or not.

root@node01> lsrsrc -c IBM.PeerNode OpQuorumTieBreaker

It should show “tb” as tie breaker.

Note: The world will not fall on your AIX servers / LPARs, if hosts in peer domain show different hdisk names on different LPARs for the tie-breaker disk. For example, if tie-breaker disk shows as /dev/hdisk1 on LPAR1 and shows as /dev/hdisk2 on LPAR2, you as a DBA should twist the arm of your SA to do more work and keep some meaningful name instead of /dev/hdisk1 or /dec/hdisk2 etc. Ask your knowledgeable SA to name it like /dev/hdisk_TB on all hosts in peer domain so that you know just by looking at the name that it is a tie-breaker disk. If your SA does not know how to do it, please lobby to get a new SA and I can refer to you some very smart SAs who are really experts in their fields but know that they are expensive.