See How Subversion MultiSite Compares to:
Central Subversion Server
- A central Subversion server can become a single point of failure. This can have a significant impact on remote sites that are separated from the central server by large time zone differences, as it can take the administrator at the central server site until the next business day (from the remote site's perspective) to restore access, whether it's due to server failure, or a network outage.
- Every remote request entails a WAN penalty. Even though Subversion clients only send changes to the central server when modifications to existing source code files are committed, when a new file is committed, or an existing file is checked out, the entire file is sent over the WAN.
- With a central server implementation lots of unnecessary read operations take place over the WAN as users at remote sites may repeatedly perform checkouts, updates and other read operations to access the same files. This degrades the performance of the central Subversion server as well as the network. Subversion MultiSite eliminates these read operations by providing a local copy of the Subversion repository at every site that is always consistent with every other. Subversion MultiSite generates no read traffic over the WAN.
- When Subversion is implemented with an Apache Web Server as a front-end, and the WebDAV HTTP protocol is used, the WAN penalty can be significant. This is especially true for commits that consist of large numbers of files. For example, a commit of a directory consisting of 100 files would require 100 separate HTTP PUTs (one for each file), each requiring its own connection to be established between the remote Subversion client and the central server over the WAN. WANdisco will send data over a single connection, and strip out unnecessary read subcommands that make up a Subversion commit command when transmitting the data over the WAN. This results in a significant boost in performance and reduction in WAN latency.
- WANdisco also provides a resilient data channel that allows transactions from remote sites to automatically continue from where they left off. With a central server implementation, if there is a transient WAN connection failure between a remote client and the server, the entire commit has to be resubmitted.
Svnsync
- Svnsync is an asynchronous unidirectional synching tool for Subversion. It is an open source solution based on a master-slave configuration. This means that all checkins must be done to a master Subversion repository. Only the master repository is writeable. The master is then replicated back to each of the read only slaves on a periodic basis. Only one slave at a time can be updated. If there are multiple slave repositories, then multiple svnsync invocations will be required. It may be the case that if the replication to slave one succeeds, the replication to slave two may fail. This can get out of hand from an administrative perspective as the number of slave repositories increases.
- As a result, the read-only slaves that the remote sites do their checkouts from are frequently out of sync with the master. In order to avoid check in failures, remote developers will need to do updates against the master Subversion repository before doing their commits. This can negate the network performance benefits and reduced bandwidth usage that should be derived from having remote repository replicas.
- In contrast, WANdisco's architecture provides bi-directional real-time synchronization capabilities across a set of globally distributed Subversion repositories. With WANdisco, all of the repositories are writeable, and changes made to one are automatically replicated to all of the others in real-time. The repositories are in more of a peer-to-peer relationship than a master-slave one.
- With svnsync, the master Subversion server can become a single point of failure for write transactions. Even if the master server is up, any loss of network connectivity means that the remote locations will be unable to commit changes.
- There are no self-healing capabilities provided with svnsync as there are with Subversion MultiSite.
SVN 1.5 Write-Thru Proxy
- With Subversion 1.5, along with other notable new features such as built-in merge tracking, the WebDAV write-thru proxy was introduced to simplify use of svnsync for Subversion deployments based around Apache 2.2.x.
- Prior to Subversion 1.5, users had to manually redirect their client to the master server whenever they executed a commit or other write transaction using the "svn switch --relocate" command. The WebDAV write-thru proxy will detect when a commit or other write command has been issued by a client connected to a read-only slave repository. It will then automatically redirect the client to the master server. This should make life somewhat easier for end users, and help prevent unintended writes against slave repositories from leading to split-brain scenarios that can be difficult to recover from.
- However, the WebDAV write-thru proxy leaves svnsync's master-slave architecture unchanged. While svnsync does offer the advantage of local reads, eliminating WAN traffic that would otherwise take place between a remote client and a central Subversion server, writes only happen on the master. As a result, the master repository can become a single point of failure for write transactions. In addition, the lag time between each instance of master repository replication can result in users at remote sites checking out stale copies of source code files from their local slave. This in turn can lead to update conflicts when changes are committed against the master. If the replication process fails due to network outages or server crashes, there are no built-in recovery capabilities.
- In contrast, Subversion MultiSite turns every Subversion repository into a peer of every other, and every repository is readable as well as writeable for the entire code base. Replication is triggered automatically when a write operation is done at any location, and transactional consistency is guaranteed across all of the repositories. Subversion MultiSite's self-healing capabilities automate the recovery process after a network outage or server crash, and prevent any data loss.
- Although both svnsync and Subversion MultiSite support the WebDAV HTTP protocol, Subversion MultiSite only uses this protocol over a LAN. WANdisco's own optimized protocol is used over a WAN on top of TCP/IP. This results in significant performance improvements over a WAN, particularly when commits consist of a large numbers of files. For example, a commit of a directory containing 100 files would require 100 separate HTTP PUTs (one for each file), each requiring its own connection to be established between the remote Subversion client and the central server over a WAN. Subversion MultiSite will send the data over a single connection, and strip out the unnecessary read subcommands that normally make up a Subversion commit command. This results in a significant boost in performance and reduction in WAN latency.
Disconnected Repositories (Git / Bitkeeper / Mercurial / Bazaar)
- Often referred to as 'Distributed Version Control Systems' [DVCS] products such as Git, Bitkeeper and Bazaar are referred to as 'distributed' because they work offline and hence a more accurate description might be 'Disconnected Repositories'.
- DVCS manage changes to a tree of files with fast and easy merging and branching. The basic assumption is that developers work independently / in isolation on their own branches and copy of the source code.
- These systems are used in open source projects where development teams are loose-knit and cohesion and trust are not of concern - everyone can have their own sandbox of the entire source code repository. At any point in time there is no 'current version'. There is no automatic replication. No golden copy of source code assets, except by unenforceable convention. Disaster recovery is questionable - what happens if your workstation crashes or is stolen? If the replication process fails due to network outages or server crashes, there are no built-in recovery capabilities.
- Contrast this with WANdisco; where every replica is a golden copy of the repository. The assumption WANdisco makes is that development teams are working collaboratively even across global teams, with continuous integration of their efforts. Here you get the best-of-both-worlds: performance of a local repository for the entire global team, with the manageability and continuous integration associated with central repositories.
Rsync
- Rsync is an open source general purpose utility for one-way incremental file transfer. Rsync is based on a "master/slave" configuration that can be used to create "slave" replicas of master repositories.
- Rsync computes strong and weak checksums for each and every file transferred. This creates a tremendous load on the CPU as well as the disk I/O subsystem and network. Because of this, most administrators typically do updates only once per day, so the slave Subversion repositories are always out of sync with the master.
- Since Rsync transfers are only done periodically, developers typically perform a fetch from the master server before committing a change set. This is done to avoid up-to-date check failures at the remote Subversion server site. This effectively negates any benefits that should be derived from having a local read-only slave, as developers need to update to the master first.
- Rsync suffers from an exploding tag problem. Since rsync doesn't understand the RCS format, it will treat each tagged file as changed and initiate file transfers for each of them. As a result, network bandwidth is consumed for no reason.
- The master Subversion repository can become a single point of failure. This can be a significant issue for remote sites that are separated by large time zone differences when the master server goes down for any length of time, as it might take the administrator at the master server site until the next day (from the remote site's perspective) to get back up and running. Even if the master server is up, any loss of network connectivity will mean that the remote locations will be unable to commit changes.
- The operational model for Subversion clients using Rsync is complex and error prone. Reads and writes must be handled by manually using the "svn switch --relocate" command in order to connect with different Subversion servers. Many developers find that this causes inadvertent source code merging and checkin snafus that lead to build breakages. This negatively impacts both developer and Subversion administrator productivity.
IBM ClearCase MultiSite
- The master ClearCase MultiSite repository can become a single point of failure. This can be a significant issue for remote sites that are separated by large time zone differences when the master server goes down for any length of time, as it might take the administrator at the master server site until the next day (from the remote site's perspective) to get back up and running. Even if the master server is up, any loss of network connectivity will mean that the remote locations will be unable to commit changes.
- Like many other multi-site solutions that lack WANdisco's active-active replication capabilities, ClearCase MultiSite compensates with source code branching and manual file merging. To prevent developers at one site from clobbering modules that developers at another site are working on, each development site is assigned a code branch. That code branch is writable by that development site only, and read only to the other sites. This effectively forces development work to be partitioned based on time zones, rather than where talent is located. Although branch mastership (ownership) can be exchanged between sites the process can be cumbersome and make collaboration between distributed teams difficult.
- When work from multiple development sites is integrated to complete a build, administrators must manually merge the files from each development site's code branches.
- ClearCase's architecture makes it complex to implement and administer. It is not uncommon for organizations to allocate a full time administrator to manage the ClearCase server and software at each development site. Synchronization is not automatic with each write operation and must be done on a scheduled basis with administrator monitoring. ClearCase provides no acknowledgement back to the sender to indicate that the updates have been transferred successfully. This means that ClearCase administrators at each site (sender and receiver) must manually confirm that files have been transferred completely. When work from multiple development sites is integrated to complete a build, administrators must manually merge the files from each development site's code branches.
- Recovery from network or server failures requires significant manual intervention by an administrator. ClearCase MultiSite does not provide the self healing capabilities offered by WANdisco that eliminate the risk of administrator error during disaster recovery.
- Developers and administrators who have been working with other SCMs will face a significant learning curve when they switch to ClearCase.
Perforce
- Perforce provides a central server architecture with a stateless caching proxy (P4P) to enhance WAN performance. P4P stores file revisions only when a client requests them. The performance gain offered by Perforce only comes into play after file revisions are cached.
- The master Perforce server can become a single point of failure. This can be a significant issue for remote sites that are separated by large time zone differences when the master server goes down for any length of time, as it might take the administrator at the master server site until the next day (from the remote site's perspective) to get back up and running. Even if the master server is up, any loss of network connectivity will mean that the remote locations will be unable to commit changes.
- Perforce does allow multiple P4P proxies to be created, but there is no replication capability to keep all of the caches in sync. P4P stores file revisions only when a client requests them. Unless the client asks for a specific version number, the proxy will have to verify with the master server if the revision changed. This can create WAN performance issues and renders P4P ineffective for many commonly used read operations such as full directory update.
- The proxy caches are not garbage collected automatically. Stale revisions have to be cleaned out manually. If manual clean up is not done, P4P can run out of disk space.
SVNreplicate
- SVNreplicate provides a suite of Subversion pre and post-commit hook scripts. SVNreplicate pre-commit hook scripts allow a user at a remote Subversion read-only slave repository site to issue a commit and have that commit redirected to the master by pre-commit hooks installed on the slave. Once the commit is written to the master, post-commit hook scripts installed on the master are triggered to replicate the commit back out to the slave repositories. This is not a true write-anywhere solution.
- Only the master repository is writeable. In addition, the master is a single point of failure for write transactions. If the master is down, no write transactions can happen. One-copy-equivalence is not guaranteed in the presence of concurrent writes at different sites, as it is with WANdisco.
- SVNreplicate doesn't provide any fault-tolerance or self-healing automated disaster recovery capabilities.
SVNbackup
- SVNbackup is not a replication solution. This tool is used to create live backups. Since the backups are live, users can still write transactions to the SVN repository while the backup is being taken. The backup is then packaged as a tar-ball that can be loaded onto other Subversion repositories.
Pushmi
- Pushmi is a master-slave solution. All write transactions are written to the central master server, and are then copied to the remote slave repositories by cron jobs.
- The master server is a single point of failure. If the master is unavailable, no write transactions can take place.
- One feature Pushmi offers in the event of update conflicts on the master, is that an immediate replication of the master to the slave is triggered. When the replication of the master is complete, the conflict is reported to the client at the slave repository site, so the client can update his sandbox, resolve the conflict and reattempt the commit on the master.
- It is also important to note that with Pushmi, all write transactions to the slaves involve two WAN operations - the initial transfer of the write to the master, and then the rsync of the write back to the slave(s).
SVK
- SVK uses Subversion's FSFS as a backend, but beyond that, it is its own SCM.
- SVK does allow multiple repositories to be readable as well as writeable, but there is no real-time enforcement of consistency across the repositories as there is with WANdisco.
- A commit can succeed on a developer's local repository where there are no conflicts, and fail when it's copied to other sites' repositories due to update conflicts. To avoid conflicts, each development site typically works on a separate branch.
- Conflicts will not be caught until later in the development cycle when the branches are merged to create a build. This can lead to longer QA cycles and unplanned rework when the conflicts are discovered later in the development cycle.
- With WANdisco's real-time active-active replication capability update conflicts and other problems are found and fixed as they occur. As a result, with WANdisco, there's less QA and rework.
Note: Best Practical, LLC, recently announced that they have ceased all new development on SVK. Best Practical, LLC acquired SVK in June, 2006
Visual Source Safe
- Visual Source Safe (VSS) is an SCM solution from Microsoft that is implemented as a shared file system. VSS by itself is not a client server based solution like CVS. Instead it is a client filesystem solution. Within a LAN, file sharing is used to access the source code repository. However, this is not feasible for remote sites accessing the repository over a WAN. Microsoft recommends purchasing Source Gear's SourceOffsite to address this problem and effectively turn VSS into a client server solution that can support remote access over a WAN. This effectively brings VSS to the same level as CVS at an additional cost.
- The master VSS database acts as a single point of failure. This can be a significant issue for remote sites that are separated by large time zone differences when the master server goes down for any length of time, as it might take the administrator at the master server site until the next day (from the remote site's perspective) to get back up and running. Even if the master server is up, any loss of network connectivity will mean that the remote locations will mean that they will be unable to commit changes.
- A single VSS server will typically be hard to scale vertically as the number of users grows. An offsite user will end up using more CPU/memory as the cvs server forked process lingers on much longer because of a slow network. This typically requires more expensive, larger machines. Many organizations rely on large dedicated SMP boxes to address scalability. In contrast, WANdisco's distributed architecture allows organizations to scale horizontally using inexpensive Linux boxes.
- If a long running commit or import operation is interrupted due to a loss of WAN connectivity, the developer has to restart the commit from the beginning. There is no capability similar to WANdisco's resilient transfer channel, which allows file transfers to continue from where they left off.
WAN Accelerators
- WAN accelerators essentially function as routers that accelerate traffic over a wide area network between remote clients and a central server using compression, protocol acceleration, and local data caching. While some reductions in WAN latency may be achieved, WAN accelerators are typically not a suitable solution for distributed development.
- Even though data caching can avoid some WAN traffic for files that are frequently accessed by users at each site, there is no mechanism for ensuring consistency across the caches at every location, or ensuring that each site's cache automatically reflects the latest changes on the central server for the subset of files that are locally cached. This is significantly different than what Subversion MultiSite provides with complete, fully readable and writeable Subversion repository replicas at each site that are always in sync.
- Files that are already compressed, such as zip files, tar files, and most image file types cannot be compressed any further, so no performance advantage is achievable with these file types.
- The end result is still a central server implementation, with all of the shortcomings that these implementations entail.
