12.5.1. Functionality of Heartbeat

In several cluster solutions (for example, in fail-over and multicast load-balancing clusters) the nodes in the cluster must monitor each other to detect if one of the nodes fails.

Heartbeat is a tool monitoring the status of the nodes in a cluster. The Heartbeat components of the nodes send keep alive messages to the other node(s). When the node stops sending heartbeat packets it is assumed to be dead, and any services (resources) it was providing are taken over by the other node(s). For this functionality, you need to define the master and slave nodes, set encryption for the communication between the nodes, and also establish and configure a dedicated interface for the communication.

How Heartbeat works

Figure 12.11. How Heartbeat works

Note

In order to use Heartbeat, the heartbeat package must be installed on all nodes of the cluster. Currently the Heartbeat package has to be installed manually by issuing the apt-get install heartbeat-2 command as root from a command line.

You are recommended to encrypt the Heartbeat signals even if you are using a dedicated interface.

Heartbeat packets can be transferred through a serial null modem cable and / or Ethernet network, for example in case of geographically separated cluster nodes. If Ethernet is used, the heartbeat signals are UDP packets targeting a broadcast address.

Heartbeat instances are installed on all nodes. Both the master (active) and the slave (passive) nodes send heartbeat packets as a kind of keep-alive message across a dedicated interface. These messages enable the monitoring and, if needed, the takeover of each others' resources. When heartbeat packets are no longer received, the node is assumed to be dead, and any services (resources) it was providing are failed over to the other node.

Note

Unfortunately it is possible that the active node fails only partially, that is, although it stops sending heartbeat messages, it still replies to ARP request or and retains the Service IP. Such situation results in two hosts — seemingly both functioning — owning the same IP on the LAN. To avoid such situation the death of the node has to be ensured by the integration of a STONITH (Shoot the Other Node In The Head) device, which practically turns off the power on the master node if it is dead.

It is also possible to install a hardware watchdog into the nodes of a cluster. A hardware watchdog is a small device that periodically receives some kind of signal (for example the heartbeat messages) from the computer — either from the kernel, or from a specific application. If the computer stops sending these signals, it is assumed to be dead, and the watchdog reboots the node.

Tip

Create a dedicated network for heartbeat messages using two Network Interface Cards (NIC) and a crossover Ethernet cable connecting them.