Network Problems and Clustering

The flexibility of a distributed cluster architecture requires you to have a stable network environment.

Vocera clustering provides a distributed architecture that allows you to locate nodes anywhere on your network, including different subnets (as described in About Vocera Voice Server Clusters) and different geographic locations (as described in Geographically Distributed Clusters). This flexibility is intended in part to provide disaster recovery capabilities from catastrophic events such as an earthquake or a WAN failure.

In particular, either of the following network problems will cause unwanted cluster behavior:

Either of the network problems described above may result in the following cluster behavior:

The following illustration shows a simple cluster with an active node and a single standby node:

Figure 1. Simple cluster with one active and one standby server Simple cluster with one active and one standby server

If the network connection between the nodes is lost, the active node sends an email to indicate that it has lost contact with a standby node. The active node continues to run, and badges that have not lost a network route to it remain connected to it. Badges that cannot find this active node display "Searching for server" and begin to cycle through their list of IP addresses, looking for the active server.

The standby node notices that it has lost contact with the active node, goes into discovery mode, fails to find the active node (because the network connection is down), and comes online as an active node. This new active node sends an email stating that it has become active, and any badges that were "Searching for server" may connect to it.

This situation is known as a split brain because multiple cluster nodes are active, and each node is unaware of other active nodes. This split brain state is shown in the following illustration:

Figure 2. Simple cluster with two active servers (a "split brain" state) Simple cluster with two active servers (a "split brain" state)

Similarly, if excessive latency results in the active node failing to service a poll from a standby node within 10 seconds, the standby node enters discovery mode, the active node sends an email message indicating that it has lost contact with a standby, and one of the following situations occurs: