WANdisco
Download Call me Whitepaper
 
 
Subsections

3 Key concepts

The WANdisco Failover Agent uses a heartbeat mechanism to detect if a replicator node has died. After a configurable heartbeat interval (default is 1 second), the WANdisco Failover Agent sends a heartbeat to each replicator in the replication group. This is transmitted over a DConeNet connection. The replicator in turn sends a, "I am alive", reply back to WANdisco Failover Agent. If the WANdisco Failover Agent does not receive any reply to a configurable number of heartbeats, it marks the replicator node as dead. The actual failover happens lazily when a request is received from a SCM client. This reduces the false alarms when a WANdisco Replicator node is re-started.

The WANdisco Failover Agent simply relays data between the SCM clients and the current active primary. The current active primary is elected based on a priority assigned to each replicator. The replicator with a priority equal to 1 is also knows as the designated primary. If the primary replicator is unavailable, the replicator with the next highest priority is elected as the current active primary.

The WANdisco HADR guarantees zero data loss when a site dies. This is achieved by using :

3.1 Two replicator based failover

The WANdisco HADR can support 2 or more replicators in the replication group. If there are only 2 replicators in the group, special consideration applies with respect to the failover mechanism:

If there are only 2 replicators in the group, some failure scenarios (documented below) require administrative action. The Web administration console will have an alert for the administrator. Email alerts can also be configured.

3.1.1 Failure of Primary

As noted above, once failover to the backup happens, the backup can not be excluded from the replication group automatically if the backup dies, unless an administrative action is taken. The required administrative action involves the following steps:

  1. Stop new SCM client connections to Failover Agent
  2. Ensure both Replicator nodes are up
  3. Wait until all submitted transactions are executed at both nodes
  4. Reset the flag using the WANdisco HADR's Web console
  5. Re-enable new SCM client connections.

Note: The above applies to only if two WANdisco Replicators are configured with the WANdisco Failover Agent.

3.1.2 Failure of Backup

As noted above, when the backup fails, the WANdisco Failover Agent will run with just the primary replicator by automatically excluding the backup. After the backup has been excluded, an administrative action is required to re-include the backup in the group. The required administrative action involves the following steps:

  1. Stop new SCM client connections to the Failover Agent
  2. When there are no remaining pending transactions at the Primary, run reset to clean-up the system database at Primary and Backup
  3. Rsync FROM the Primary TO the Backup
  4. Restart the Primary and Backup
  5. Reset the flag
  6. Enable new SCM client connections.

Note: The above applies to only if two WANdisco Replicators are configured with the WANdisco Failover Agent.