CVS Access ControlCVS Access Control iconCVS Access Control CVS ClusteringCVS Clustering iconCVS Clustering

CVS MultiSite

Distributed Software Development for CVS

CVS MultiSite leverages WANdisco's unique replication technology to immediately synchronize CVS repositories connected over a wide area network (WAN). Users at every location experience local area network (LAN) speed performance for both read and write operations. CVS MultiSite also provides continuous hot backup and self-healing capabilities that automate disaster recovery, so that downtime is virtually eliminated.

WANDisco CVS MultiSite image

Where to Next:

Where can I learn more? Subversion product brochure pdf icon Download the Product Brochure, Watch our video overview, see 10 reasons to use CVS MultiSite for distributed development, check out how CVS MultiSite stacks up against the competition, or get more information about pricing.

Slide Show 5 Minute Demo
10 Minute Demo Download

Features

  • Immediately synchronizes each development site's CVS repository with its peers at other sites on every commit, or other write operation. CVS repositories connected over a WAN become mirrors of each other.
  • Overcomes WAN latency between development sites. Users at every location experience LAN-speed performance for both read and write operations. At the same time, bandwidth usage and costs go down by as much as 80%.
  • Developers at different locations can simultaneously checkout and checkin the same files, and resolve conflicts and other problems when they occur, instead of days or weeks later, so there's less QA and rework.
  • Requires no retraining. Developers and administrators use the CVS clients and tools they're familiar with, and CVS functionality doesn't change.
  • Continuous hot backup and self-healing capabilities automate disaster recovery without administrator involvement. Costly disk mirroring solutions are completely unnecessary.
     

What is CVS?

CVS is an open source version control system that stores and tracks changes made to any type of electronic data, including source code files, web pages, documents, or images. CVS has been available since 1989, and is still widely used today.

Foundation for a Complete Solution Stack

CVS MultiSite can be implemented standalone, or in combination with WANdisco's Clustering, or Access Control solutions for CVS.

See How CVS MultiSite Compares to:

 

Central CVS Server

  • Many organizations using CVS simply go with a single master CVS server that remote sites access using VPN or a secure protocol such as SSH. While everyone is technically working from the same copy of CVS, there are a number of drawbacks to this approach.
  • The central CVS server can become a single point of failure. This can be a significant issue for remote sites that are separated by large time zone differences when the master server goes down for any length of time, as it might take the administrator at the master server site until the next day (from the remote site's perspective) to get back up and running. Even if the central server is up, any loss of network connectivity will mean that the remote locations will be unable to commit changes.
  • Every remote CVS request entails a WAN penalty. For example, a checkout of 500MB by a developer in India from a repository based in the US can take several hours. Each developer in India will end up doing independent updates that will bring the same revisions from the master server in the US multiple times. This wastes a significant amount of developer time, as well as network bandwidth.
  • If a long running commit/import operation is interrupted due to a transient WAN outage, the developer has to restart the commit again from the beginning. WANdisco addresses this by providing a resilient data channel that allows remote transfers to automatically resume from where they left off. In addition, CVS uses a very chatty protocol. This compounds network latencies and forces many companies to absorb the cost of additional E-1 lines from India. WANdisco makes this unnecessary. On this basis alone, WANdisco easily pays for itself.
  • A single CVS server will typically be hard to scale vertically as the number of CVS users grows. An offsite user will end up using more CPU/memory as the CVS server forked process lingers on much longer because of a slow network. This typically requires more expensive, larger machines. Many organizations rely on large dedicated SMP boxes to address scalability. In contrast, WANdisco's distributed architecture allows organizations to scale horizontally using inexpensive Linux boxes.

Back to Top

Disconnected Repositories (Git / Bitkeeper / Mercurial / Bazaar)

  • Often referred to as 'Distributed Version Control Systems' [DVCS] products such as Git, Bitkeeper and Bazaar are referred to as 'distributed' because they work offline and hence a more accurate description might be 'Disconnected Repositories'.
  • DVCS manage changes to a tree of files with fast and easy merging and branching. The basic assumption is that developers work independently / in isolation on their own branches and copy of the source code.
  • These systems are used in open source projects where development teams are loose-knit and cohesion and trust are not of concern - everyone can have their own sandbox of the entire source code repository. At any point in time there is no 'current version'. There is no automatic replication. No golden copy of source code assets, except by unenforceable convention. Disaster recovery is questionable - what happens if your workstation crashes or is stolen? If the replication process fails due to network outages or server crashes, there are no built-in recovery capabilities.
  • Contrast this with WANdisco; where every replica is a golden copy of the repository. The assumption WANdisco makes is that development teams are working collaboratively even across global teams, with continuous integration of their efforts. Here you get the best-of-both-worlds: performance of a local repository for the entire global team, with the manageability and continuous integration associated with central repositories.

Back to Top

Rsync

  • Rsync is an open source general purpose utility for one-way incremental file transfer. Like CVSup, rsync is based on a "master/slave" configuration that can be used to create "slave" replicas of master repositories.
  • Rsync computes strong and weak checksums for each and every file transferred. This creates a tremendous load on the CPU as well as the disk I/O subsystem and network. Because of this, most administrators typically do updates only once per day, so the slave CVS repositories are always out of sync with the master.
  • Since Rsync transfers are only done periodically, developers typically perform a fetch from the master server before committing a change set. This is done to avoid up-to-date check failures at the remote CVS server site. This effectively negates any benefits that should be derived from having a local read-only slave, as developers need to update to the master first.
  • Rsync suffers from an exploding tag problem. Since rsync doesn't understand the RCS format, it will treat each tagged file as changed and initiate file transfers for each of them. As a result, network bandwidth is consumed for no reason.
  • The master CVS repository can become a single point of failure. This can be a significant issue for remote sites that are separated by large time zone differences when the master server goes down for any length of time, as it might take the administrator at the master server site until the next day (from the remote site's perspective) to get back up and running. Even if the master server is up, any loss of network connectivity will mean that the remote locations will be unable to commit changes.
  • The operational model for CVS clients using Rsync is complex and error prone. As is the case with CVSup, reads and writes must be handled by manually switching CVSROOT in order to connect with different CVS servers. Many developers find that this causes inadvertent source code merging and checkin snafus that lead to build breakages. This negatively impacts both developer and CVS administrator productivity.

Back to Top

IBM ClearCase MultiSite

  • The master ClearCase MultiSite repository can become a single point of failure. This can be a significant issue for remote sites that are separated by large time zone differences when the master server goes down for any length of time, as it might take the administrator at the master server site until the next day (from the remote site's perspective) to get back up and running. Even if the master server is up, any loss of network connectivity will mean that the remote locations will be unable to commit changes.
  • Like many other multi-site solutions that lack WANdisco's active-active replication capabilities, ClearCase MultiSite compensates with source code branching and manual file merging. To prevent developers at one site from clobbering modules that developers at another site are working on, each development site is assigned a code branch. That code branch is writable by that development site only, and read only to the other sites. This effectively forces development work to be partitioned based on time zones, rather than where talent is located. Although branch mastership (ownership) can be exchanged between sites the process can be cumbersome and make collaboration between distributed teams difficult.
  • When work from multiple development sites is integrated to complete a build, administrators must manually merge the files from each development site's code branches.
  • ClearCase's architecture makes it complex to implement and administer. It is not uncommon for organizations to allocate a full time administrator to manage the ClearCase server and software at each development site. Synchronization is not automatic with each write operation and must be done on a scheduled basis with administrator monitoring. ClearCase provides no acknowledgement back to the sender to indicate that the updates have been transferred successfully. This means that ClearCase administrators at each site (sender and receiver) must manually confirm that files have been transferred completely. When work from multiple development sites is integrated to complete a build, administrators must manually merge the files from each development site's code branches.
  • Recovery from network or server failures requires significant manual intervention by an administrator. ClearCase MultiSite does not provide the self healing capabilities offered by WANdisco that eliminate the risk of administrator error during disaster recovery.
  • Developers and administrators who have been working with other SCMs will face a significant learning curve when they switch to ClearCase.

Back to Top

Perforce

  • Perforce provides a central server architecture with a stateless caching proxy (P4P) to enhance WAN performance. P4P stores file revisions only when a client requests them. The performance gain offered by Perforce only comes into play after file revisions are cached.
  • The master Perforce server can become a single point of failure. This can be a significant issue for remote sites that are separated by large time zone differences when the master server goes down for any length of time, as it might take the administrator at the master server site until the next day (from the remote site's perspective) to get back up and running. Even if the master server is up, any loss of network connectivity will mean that the remote locations will be unable to commit changes.
  • Perforce does allow multiple P4P proxies to be created, but there is no replication capability to keep all of the caches in sync. P4P stores file revisions only when a client requests them. Unless the client asks for a specific version number, the proxy will have to verify with the master server if the revision changed. This can create WAN performance issues and renders P4P ineffective for many commonly used read operations such as full directory update.
  • The proxy caches are not garbage collected automatically. Stale revisions have to be cleaned out manually. If manual clean up is not done, P4P can run out of disk space.

Back to Top

CVSup

  • CVSup is a one-way file transfer tool that is used after changes have been made in CVS repository files. CVSup is an open source solution based on a "master/slave" configuration that was implemented to allow the FreeBSD community to pull FreeBSD sources on demand from a master server. CVSup can parse RCS files and transfer only files whose versions have changed.
  • The master CVS server can become a single point of failure. This can be a significant issue for remote sites that are separated by large time zone differences when the master server goes down for any length of time, as it might take the administrator at the master server site until the next day (from the remote site's perspective) to get back up and running. Even if the master server is up, any loss of network connectivity will mean that the remote locations will be unable to commit changes
  • CVSup computes a strong logical checksum for the RCS files that need to be transferred. Before the checksum can be computed, each RCS file to be transferred must be canonicalized to remove white spaces. This creates a tremendous load on the CPU as well as the disk i/o subsystem and network. Because of this, most administrators do infrequent updates, typically once per day. As a result, the slave CVS repositories are always out of sync with the master. Since CVSup transfers are only done periodically, fetch from master server is still the norm before a CVS user commits a change set. This is done to avoid up-to-date check failures at the remote CVS server site. This negates any benefits that should be derived from having a local read-only slave, as developers need to update to the master first.
  • In addition, the operational model for CVS clients using CVSup is somewhat complex and error prone. Reads and writes must be handled by manually switching CVSROOT in order to connect with different CVS servers. Many developers find that this causes inadvertent source code merging and checkin snafus that lead to build breakages. This negatively impacts both developer and CVS administrator productivity.

Back to Top

CVSproxy

  • CVSProxy is an enhancement added to the CVS code base that addresses the issue of forcing developers to switch CVSROOT between read-only slave replicas and the master repository. CVSProxy transparently redirects write commands to the master node, while reads are directed to the local read-only repository. This overcomes problems that result from inadvertent source code merging and check-in snafus that lead to build breakages when developers have to manually change CVSROOT.
  • CVSproxy is currently supported with the :ext protocol, and not the more widely used :pserver protocol.
  • CVSProxy does nothing to replicate changes to slave CVS repositories. Some organizations set up CVS scripting hooks such as loginfo, postadmin, posttag and postwatch on the primary or "master" server to replicate changes to the secondary slaves. The main problem with these approaches is that the clients are blocked until the scripts complete replication of any changes from the master to the slaves.
  • As another alternative, many CVS shops often use CVSProxy in conjunction with CVSUp, or rsync to address file transfer to the slave repositories. These solutions in conjunction with CVSProxy can address the client blocking problem described above.
  • CVSProxy used with or without CVSUp or rsync does nothing to address the problem of the master CVS repository becoming a single point of failure. This can be a significant issue for remote sites that are separated by large time zone differences when the master server goes down for any length of time, as it might take the administrator at the master server site until the next day (from the remote site's perspective) to get back up and running. Even if the master server is up, any loss of network connectivity will mean that the remote locations will be unable to commit changes.

Back to Top

Visual Source Safe

  • Visual Source Safe (VSS) is an SCM solution from Microsoft that is implemented as a shared file system. VSS by itself is not a client server based solution like CVS. Instead it is a client filesystem solution. Within a LAN, file sharing is used to access the source code repository. However, this is not feasible for remote sites accessing the repository over a WAN. Microsoft recommends purchasing Source Gear's SourceOffsite to address this problem and effectively turn VSS into a client server solution that can support remote access over a WAN. This effectively brings VSS to the same level as CVS at an additional cost.
  • The master VSS database acts as a single point of failure. This can be a significant issue for remote sites that are separated by large time zone differences when the master server goes down for any length of time, as it might take the administrator at the master server site until the next day (from the remote site's perspective) to get back up and running. Even if the master server is up, any loss of network connectivity will mean that the remote locations will mean that they will be unable to commit changes.
  • A single VSS server will typically be hard to scale vertically as the number of users grows. An offsite user will end up using more CPU/memory as the cvs server forked process lingers on much longer because of a slow network. This typically requires more expensive, larger machines. Many organizations rely on large dedicated SMP boxes to address scalability. In contrast, WANdisco's distributed architecture allows organizations to scale horizontally using inexpensive Linux boxes.
  • If a long running commit or import operation is interrupted due to a loss of WAN connectivity, the developer has to restart the commit from the beginning. There is no capability similar to WANdisco's resilient transfer channel, which allows file transfers to continue from where they left off.

Back to Top

WAN Accelerators

  • WAN accelerators essentially function as routers that accelerate traffic over a wide area network between remote clients and a central server using compression, protocol acceleration, and local data caching. While some reductions in WAN latency may be achieved, WAN accelerators are typically not a suitable solution for distributed development.
  • Even though data caching can avoid some WAN traffic for files that are frequently accessed by users at each site, there is no mechanism for ensuring consistency across the caches at every location, or ensuring that each site's cache automatically reflects the latest changes on the central server for the subset of files that are locally cached. This is significantly different than what CVS MultiSite provides with complete, fully readable and writeable CVS repository replicas at each site that are always in sync.
  • Files that are already compressed, such as zip files, tar files, and most image file types cannot be compressed any further, so no performance advantage is achievable with these file types.
  • The end result is still a central server implementation, with all of the shortcomings that these implementations entail.

Back to Top

Ten Reasons to Use CVS MultiSite

1 - Unique capabilities shorten development cycles by 50% or more.
  • Peer-to-peer architecture with no single point of failure eliminates downtime.
  • Not a master-slave, or multi-master solution. CVS repositories at every site are writeable as well as readable for the entire code base.
  • Immediate active-active replication of write transactions eliminates WAN latency and insures that distributed CVS repositories are always in sync.
  • Developers at remote sites don't hold back their commits until the end of the day, or end of the week as they may have in the past due to poor network performance.
  • Update conflicts and other problems are found and fixed as they occur, so less time is spent on QA and rework.
2 - Developers collaborate instead of working in silos.
  • Development tasks no longer have to be partitioned on the basis of geography, or time zone constraints because of technical limitations.
  • The most talented resources for a given project or task can work together. This in itself leads to shorter development cycles, reduced costs, and higher quality.
3 - Designed to deliver unbeatable performance.
  • CVS MultiSite combines a smart commit strategy with network optimization features that deliver LAN-speed performance over the WAN for write operations at every location, while keeping all of the epositories in sync.
  • Checkouts and other read operations are always local, so no WAN traffic is generated.
  • CVS MultiSite provides a follow-the-sun option that allows performance to be further optimized based on each development location's normal working hours.
  • Network traffic and bandwidth usage costs go down dramatically.
4 - Peace of mind from zero downtime and zero data loss.
  • WANdisco's unique replication technology turns distributed CVS repositories into mirrors of each other, providing continuous hot-backup by default.
  • Recovery is automatic after a network outage or server crash.
  • The risk of human error from manual recovery procedures is completely eliminated.
  • Expensive third party disk mirroring solutions for backup and recovery are not required.
5 - No retraining is required.
  • CVS' functionality doesn't change after CVS MultiSite is implemented.
  • Developers and administrators continue to use the clients and tools they're familiar with.
  • CVS MultiSite is implemented as a transparent network proxy for the local CVS server at each development location.
  • CVS MultiSite's transparent implementation means that client configurations don't change, and there are no changes to the CVS server's filesystem at each location.
6 - Easy to install and configure.
  • Intuitive browser interface makes installation and configuration a snap.
  • Multiple sites can be up and running within an hour.
  • Normally installed on the same server as CVS, so no additional hardware is required.
  • Works with all CVS clients that use pserver, gserver or SSH, including CVS command line.
  • Written in Java, and runs under Unix, Linux and Windows operating systems.
7 - Easy to administer.
  • All sites can be administered from a single location
  • Replication is automatic with each write operation and transactional consistency is guaranteed across all distributed CVS repositories.
  • Setup and maintenance of site specific branches to avoid conflicts is completely unnecessary.
  • Built-in self-healing capabilities make disaster recovery automatic without any administrator involvement.
8 - Makes full 24X7 global operation possible without any other software or hardware.
  • WANdisco's unique replication technology turns CVS repositories distributed over a WAN into mirrors of each other, providing continuous hot-backup by default as part of normal operation.
  • Hot deploy features make it possible to add new CVS servers to a multi-site implementation, or take existing servers offline for maintenance without interrupting usage for other sites.
  • When new servers are added, or existing servers are brought back online they automatically sync up with the CVS servers at the other sites.
9 - Proven technology used by Fortune Global 1000 companies and startups outsourcing for the first time.
  • Firms such as AT&T, Motorola, Avaya, Verisign, Lockheed Martin, Bally Technologies, Honda, Cirrus Logic and others rely on WANdisco's technology.
  • Even smaller organizations outsourcing for the first time find that CVS MultiSite pays for itself many times over during just the first year of use.
10 - Low cost, high return.
  • CVS MultiSite's dramatic cost savings result from four main factors:
    • Reduced development cycle time due to reductions in QA and rework.
    • Reduced downtime.
    • Reduced administrative overhead.
    • Reduced network bandwidth usage costs.
  • CVS MultiSite even delivers dramatic cost savings over operating with no multi-site solution and using a central CVS server with remote developers connecting over the WAN.
  • WANdisco provides an ROI calculator to determine what cost savings can be expected in each customer's environment in comparison to other multi-site solutions.