During DB2 pureScale install, you must have given the tie-breaker disk device name. It is a good practice to check if that disk was in fact used as a tie-breaker disk or not.

# db2cluster -cm -list -tiebreaker

If the output from the above command is "Operator", the disk was not used for the tie-breaker. You will not see the problem until some hosts fail and the proper failover would not happen as an "Operator" is supposed to interfere.

Run the command to set the tie breaker again.

# db2cluster -cm -set -tiebreaker -disk /dev/dm-0

Where /dev/dm-0 is the tie-breaker disk. In my case, I used a 100MB LUN to be used as a tie-breaker disk. The error probably would show as why the disk was not used as a tie-breaker disk. Please refer to my earlier article for how to check if your storage supports SCSI-3 PR or not.

If your disk is SCSI-3 PR capable and the above command failed, it has something to do the with logic in the code that it could not determine the WWN of the disk. So, you have to use an alternate method to define the tie-breaker disk.

Find out the WWID of the disk (This is also known as UUID of the disk). Run scsi_id command.

On SLES, scsi_id is in /lib/udev. Locate it if your distribution is RedHat.

# WWID=`/lib/udev/scsi_id -g -u /dev/dm-0`
# echo $WWID
360080e500017cd0a00000bce501d3dc4
# db2cluster -cm -set -tiebreaker -disk WWID=$WWID
Configuring quorum device for domain 'db2domain_20130127144636' ...
Configuring quorum device for domain 'db2domain_20130127144636' was successful.

How do you know how many Tie Breakers are defined?

# lsrsrc -Ab IBM.TieBreaker
Resource Persistent and Dynamic Attributes for IBM.TieBreaker
resource 1:
        Name                = "Fail"
        Type                = "Fail"
        DeviceInfo          = ""
        ReprobeData         = ""
        ReleaseRetryPeriod  = 0
        HeartbeatPeriod     = 0
        PreReserveWaitTime  = 0
        PostReserveWaitTime = 0
        NodeInfo            = {}
        ActivePeerDomain    = "db2domain_20130127144636"
        ConfigChanged       = 0
resource 2:
        Name                = "db2_Quorum_Disk:12_7_59"
        Type                = "SCSIPR"
        DeviceInfo          = "WWID=360080e500017cd0a00000bce501d3dc4"
        ReprobeData         = ""
        ReleaseRetryPeriod  = 0
        HeartbeatPeriod     = 5
        PreReserveWaitTime  = 0
        PostReserveWaitTime = 0
        NodeInfo            = {}
        ActivePeerDomain    = "db2domain_20130127144636"
        ConfigChanged       = 0
resource 3:
        Name                = "Operator"
        Type                = "Operator"
        DeviceInfo          = ""
        ReprobeData         = ""
        ReleaseRetryPeriod  = 0
        HeartbeatPeriod     = 0
        PreReserveWaitTime  = 0
        PostReserveWaitTime = 0
        NodeInfo            = {}
        ActivePeerDomain    = "db2domain_20130127144636"
        ConfigChanged       = 0

How do you know which one is actually used?

# lsrsrc -c IBM.PeerNode
Resource Class Persistent Attributes for IBM.PeerNode
resource 1:
        CommittedRSCTVersion     = ""
        ActiveVersionChanging    = 0
        OpQuorumOverride         = 0
        CritRsrcProtMethod       = 1
        OpQuorumTieBreaker       = "db2_Quorum_Disk:12_7_59"
        QuorumType               = 0
        QuorumGroupName          = ""
        Fanout                   = 32
        OpFenceGroup             = "gpfs_grp"
        NodeCleanupCommand       = "/usr/sbin/rsct/sapolicies/db2/hostCleanupV10.ksh"
        NodeCleanupCriteria      = "Enable,RetryCount=10,RetryInterval=30000,Parms= 1 DB2 0 CLEANUP_ALL"
        QuorumLessStartupTimeout = 120

The OpQuorumTieBreaker shows the name of the actual tie breaker used.