![]()
|
|||||||||
WANdisco HADR provides high availability (HA) and disaster recovery (DR) for a
software configuration management (SCM)/source code repository like CVS or
Subversion. It allows an SCM user to transparently failover to the next available replica in the event the designated primary SCM server fails. This is achieved using
the WANdisco Failover Agent. The SCM user connects to the WANdisco Failover Agent on the standard SCM port (configurable) like 2401 for CVS, 3690 for Subversion, 80 for Subversion-HTTP. The failover-agent
in turn connects to one of the available WANdisco Replicators. The WANdisco HADR guarantees RPO (Recovery Point Objective) equal to 0 i.e. zero data loss even if the failure happens in the middle of a transaction.
In this administration guide, you will learn how to easily setup WANdisco HADR as part of a normal WANdisco Replicator install.
This guide is intended for an SCM administrator or a user who is reasonably comfortable with:
inetd/xinetd service on Unix/Cygwin or Windows service
If you don't meet the above pre-requisites you may want to contact your SCM administrator or request WANdisco to do a professional install for you.
The diagram below illustrates a typical deployment architecture for a Subversion based backend. Similar deployment architecture applies to CVS or Subversion-HTTP. As you can see in the diagram below, each replica uses the WANdisco replication technology to ensure the primary and backups stay in sync with each other. WANdisco replication technology supports active-active replication. For the WANdisco HADR, only one replica node is allowed to be active at any given time by the license manager.

As show in the above diagram, the WANdisco HADR exchanges heartbeats with the replicas over a configurable control port (defaults to 6444). The heartbeats are used to test the liveness of a replica and mark a node as the active primary, it is not used for replication. The WANdisco Replicators communicate with each other on the DConeNet control port (defaults to 6444) for replication. The control ports are multi-protocol, in other words they can speak Http, DConeNet, DFTP etc protocols. This reduces the need for the administrator to open manage multiple ports for various protocols used by the WANdisco HADR or WANdisco Replicator.
Here is an explanation of various TCP ports used in the above deployment:
The WANdisco Failover Agent uses a heartbeat mechanism to detect if a replicator node has died. After a configurable heartbeat interval (default is 1 second), the WANdisco Failover Agent sends a heartbeat to each replicator in the replication group. This is transmitted over a DConeNet connection. The replicator in turn sends a, "I am alive", reply back to WANdisco Failover Agent. If the WANdisco Failover Agent does not receive any reply to a configurable number of heartbeats, it marks the replicator node as dead. The actual failover happens lazily when a request is received from a SCM client. This reduces the false alarms when a WANdisco Replicator node is re-started.
The WANdisco Failover Agent simply relays data between the SCM clients and the current active primary. The current active primary is elected based on a priority assigned to each replicator. The replicator with a priority equal to 1 is also knows as the designated primary. If the primary replicator is unavailable, the replicator with the next highest priority is elected as the current active primary.
The WANdisco HADR guarantees zero data loss when a site dies. This is achieved by using :
The WANdisco HADR can support 2 or more replicators in the replication group. If there are only 2 replicators in the group, special consideration applies with respect to the failover mechanism:
If there are only 2 replicators in the group, some failure scenarios (documented below) require administrative action. The Web administration console will have an alert for the administrator. Email alerts can also be configured.
As noted above, once failover to the backup happens, the backup can not be excluded from the replication group automatically if the backup dies, unless an administrative action is taken. The required administrative action involves the following steps:
Note: The above applies to only if two WANdisco Replicators are configured with the WANdisco Failover Agent.
As noted above, when the backup fails, the WANdisco Failover Agent will run with just the primary replicator by automatically excluding the backup. After the backup has been excluded, an administrative action is required to re-include the backup in the group. The required administrative action involves the following steps:
reset to clean-up the system database at Primary and Backup
Note: The above applies to only if two WANdisco Replicators are configured with the WANdisco Failover Agent.
Before running the WANdisco HADR, please ensure:
C:/Program Files/java/jdk1.5.0_03. Choose an alternate installation
directory, for example C:/java with no spaces in the path.
license.key file into [replicator]/config.
Depending upon your license,
certain features may be disabled. This document will highlight any features that
require an Enterprise
Edition license. In particular, security (access control) features require an
Enterprise
Edition license.
JAVA_HOME environment
variable is defined.
JDK is installed, not just the JRE. This can
be confirmed by running java -server -version. If it generates a not found
error, uninstall the JRE Java package and reinstall the JDK Java
package.
perl executable is on
the system PATH.
logtimefix script requires Perl package Date::Manip.
- To install a Perl package on UNIX/Cygwin, using CPAN (ensure you are logged into the root account first), for example:
$ perl -MCPAN -eshell cpan> install Term::ReadKey
perl executable is on
the system PATH. Please download the MSI installer for ActivePerl from
here: [http://activestate.com/Products/Download/Download.plex?id=ActivePerl]
logtimefix script requires Perl package Date::Manip.
- To install a Perl package on Windows, using ActivePerl package manager,
for example:
c:\>ppm ... ppm>install DateManip
importauditdb script requires Perl package Perl::DBI.
XML::Parser package is required. It should be bundled as a standard module on a perl 5.8 installation.
Important: The WANdisco HADR is installed as part of a WANdisco Replicator install. You will want to consult the WANdisco Replicator install guide for further information on installing the WANdisco Replicator. This guide only highlights the steps relevant to installing and configuring the WANdisco Failover Agent.
Untar or unzip (using WinZip for example on Windows) the WANdisco Replicator package
(tar.gz) into the intended subdirectory.
You should see the following directory structure:
$ cd [replicator] $ ls config lib logs bin docs systemdb
failoveragent , shutdown
jar files and DLLs that are required to run the product.
FailoverAgent-prefs.log.0.
If the installation requirements as specified in the previous section have been met, the express setup should take 20 minutes or less to get a basic WANdisco HADR environment configured.
The express setup option can be used to quickly create the prefs*.xml
configuration files used by WANdisco HADR. This is accomplished by running the bundled
program, [replicator]/bin/setup. The text console based UI will guide you
through basic configuration options.
At the end of the setup program:
prefs-{hostname}.xml file for each replicator in the replication group and
prefs-failoveragent.xml file for the Failover Agent.
Simply copy the [replicator]/config/prefs-(hostname).xml to the [replicator]/config/prefs.xml file on the corresponding replicator host. The prefs-failoveragent.xml file needs to be copied to the [replicator]/config/prefs.xml file on the corresponding HADR/WANdisco Failover Agent host.
It is recommended that you install the WANdisco Failover Agent on a separate machine from the WANdisco Replicators. This will ensure WANdisco Failover Agent is available even when a WANdisco Replicator machine fails.
We will now walk you step by step, through the setup screens for WANdisco Failover Agent. For rest of the installation, please follow the WANdisco Replicator administration guide.
The setup screen below presume a Subversion deployment but they are applicable (with different default ports) to CVS or Subversion-HTTP deployments as well.
setup program:
$ [replicator]/bin/setupPlease go through the initial WANdisco Replicator setup screens as documented in the WANdisco Replicator administration guide
Now you will specify the number of replicators/replicas that are being setup. Each replicator will act as a proxy for the local repository replica. After specifying the number of replicas, you will be prompted for the network settings for each replica. How many replicas do you want? [2] : 2
If you have a license for HADR product, you can choose to configure the failoveragent. The failoveragent will then act as a proxy for the SCM clients. If the failoveragent detects the primary replicator node is down, it can automatically failover svn clients to one of the configured backups. In this interview you will define the priority for each replicator. A replicator node with priority 1 will act as the primary. When failing over to a backup, an alive node with the smallest priority number will be chosen. So for example a replicator node with priority 2 will be chosen over a node with priority 3 if both are alive. Is Failover Agent needed? Y/N [N] :
Setting up Failover Agent ....
_________________________________
Now you will specify the Ethernet MAC address of the host on which
Failover-Agent would be running. It is required that you specify a unique
MAC address for each host on which Failover-Agent would be running.
The MAC address on UNIX can be obtained via "ifconfig" command
and on Windows via "ipconfig /all" command. The MAC Address looks like
this - 00-02-A5-C1-7A-2F (Windows) or 00:02:A5:C1:7A:2F (UNIX). If you
don't have all the MAC addresses handy, now would be a good time to
get them before proceeding further.
Enter the MAC Address : 00:12:E5:C1:7A:2A
Setting up Failover Agent ....
_________________________________
Now you will specify the host:port used by Subversion clients to
connect with the Failover-Agent. Setting the port to 3690,
would be the most transparent option from the Subversion client
perspective. Note you can NOT specify 0.0.0.0 or localhost
as the host on which Failover-Agent would be running. The
hostname needs to be the DNS hostname or the valid IP address to
which remote Subversion clients as well as remote Subversion Failover-Agents can connect.
For example, let us say on a subnet 192.168.1 in Tokyo, the LAN address of
Failover-Agent machine is 192.168.1.29 and the external WAN address is
203.23.12.129 (DNS hostname is tokyo.svnrus.org). The Failover-Agent address
should be specified as 203.23.12.129 or tokyo.svnrus.org and NOT 192.168.1.29.
Enter the hostname or IP address of the Failover-Agent : tao
Enter the TCP port for the Failover-Agent [3690] :
Setting up Failover Agent ....
_________________________________
Now you will specify the DConeNet port used by the Failover-Agent to
communicate with other Replicators. This is not visible to Subversion
clients but used for actual data transfer between the Replicators
and/or Failover-Agents.
Enter the TCP port for DConeNet [6444] : 6444
Enter a nice name for the node, for e.g. "Tokyo Site" [tao:6444] :
Setting up replicator instance .... #1
______________________________________
Since you have elected to configure a Failover Agent, you will need to
specify the priority for each replicator. The priority order determines
which replicator instance is picked when failing over. Choose priority
of 1 for the Primary replicator. Smaller numbers indicate higher priority.
Enter a failover priority [1-2] [1] :
config directory on each host and rename them to prefs.xml. Now you are ready to run the WANdisco HADR.
The WANdisco Failover Agent by default uses a heartbeat frequency of 1 heartbeat every second. This is appropriate for a LAN deployment. If you are deploying the WANdisco HADR over a WAN, then you may want to change the heartbeat frequency based on WAN latencies. The heartbeat interval should be 2-3 times the expected WAN latency between WANdisco Failover Agent and the replicator site. This will ensure missing heartbeats can be distinguished from a slow network link.
You can also tune the number of missing heartbeats that dictates when the WANdisco Failover Agent marks a replicator node as dead and triggers failover. Default is 4.
These values can be dynamically tuned from the WANdisco Failover Agent web console:

The WANdisco Failover Agent is capable of starting the WANdisco Replicator nodes from the web-console directly, provided you enter the startup commands via the web-console. The WANdisco Failover Agent can use ssh to launch WANdisco Replicators on remote machines also as depicted in the following screen-shot:

The WANdisco Failover Agent is capable of generating email alerts provided an email address is specified via the web console. The email alerts are generated whenever the WANdisco Failover Agent detects an event related to failover :
These values can be dynamically set from the WANdisco Failover Agent web console:

Express setup tool supports a -silent and -record option to allow
an admin to perform a silent install without being prompted for input
on the console.
An admin could start the setup program in the record mode
and then latter use the recorded answers file to replay the
answers and perform a silent install. The admin could modify
the recorded answers in a text editor and then use -silent
to create new configuration files. For example
$ ./svn-replicator/bin/setup -record my-answers $ vi my-answers $ ./svn-replicator/bin/setup -silent my-answers $ ./svn-replicator/bin/setup -silent old-ans -record new-ans
The answers are recorded continuously, so if you restart setup you can also use the recorded file to pick up from where you left off, without having to re-enter the answers.
For more information look at the usage of the setup command:
$ ./replicator/bin/setup -h
setup [-silent recorded-setup-file] [-record file-to-record-to]
-silent recorded-setup-file :
Silent install will use the supplied "recorded-setup-file" to
automatically answer the setup interview questions. If all the
answers are not supplied, it will prompt on the console.
-record file-to-record-to
Will record all the valid interview answers to the
"file-to-record-to". Can latter be used for silent install.
Both options can also be used at the same time. For
example to continue an install from where you last
left off you could do:
setup -silent prev-silent-file -record new-silent-file
That's it, now you are ready to run the WANdisco HADR.
Note: Please startup all the WANdisco Replicators first before starting the WANdisco Failover Agent. See the WANdisco Replicator administration guide on how to launch the WANdisco Replicator. The WANdisco Replicator must be started in the watchdog mode using the -wdog option.
Using the startup script provided to run WANdisco HADR from the command line:
$ [replicator]/bin/failoveragent $ tail -f [replicator]/logs/FailoverAgent-prefs.log.0 .... INFO: [main] Failover Agent listener is now turned ON at port : ....
When you see the last line, you know WANdisco HADR has started successfully. The WANdisco HADR will also startup the remote replicators if the SSH based startup commands are provided. When you start it for the first time, the SSH startup commands do not exist, so the WANdisco Failover Agent will startup without any replicators running. In that situation you can either manually startup the remote replicators or use the web console to specify the SSH startup command and startup from the web console itself.
Alternatively, you can go to the web console and check the status.
To shutdown WANdisco HADR, just run
$ [replicator]/bin/shutdown
This will trigger shutdown of the replicators also.
Caution: We recommend taking all possible precautions to avoid direct access to the SCM server, bypassing the WANdisco HADR. For example, you could setup SCM server to only allow connection from the IP address of the host on which WANdisco Failover Agent is running, you could limit shell access to the replicator and the SCM repository machine.
The WANdisco HADR has a built-in web-server that can be used for monitoring and dynamic configuration of the WANdisco Failover Agent and WANdisco Replicator. You can connect to the web-server on WANdisco Failover Agent's DConeNet control port (defaults to 6444). Here is a screen-shot of the WANdisco Failover Agent's web console:

Please ensure your license.key file specifies valid number of sites
and allowed IP addresses on which the WANdisco HADR is allowed to run. If you have an
unlimited license, you do not have any restrictions on number of sites or IP addresses.
Please follow the replicator administration guide to manage the WANdisco Replicator.
| Copyright © 2005 WANdisco | Sitemap | Privacy Policy | User Agreement | Contact Us |