Simplifying multi-cloud migration and operations with a LiveData platform
By WANdisco , Oct 04, 2018 in Tech & Trends
Migrating to - and operating - multi-cloud environments presents significant data challenges. Mark Bartlo and Niraj Jaiswal discuss industry perceptions and trends, a LiveData use case, and how enabling a LiveData environment can reduce risk, cut costs, and transform business capabilities.
DISCOtecher: Let’s start today’s conversation with typical assumptions about data movement to the cloud. What is one of the most common perceptions about achieving data consistency during data migrations to and between clouds?
Mark: I think people believe that existing technologies meet only a specific set of business requirements, generally for one-way replication. Typically, this would be a production system in Region A, with backup, disaster recovery or load sharing to Region B. If they make changes on the target Region, in this example Region B, there is no ability to replicate them back to Region A, with the result that the data becomes inconsistent. From my conversations with data management professionals, there’s a widely-held belief in this community that keeping data absolutely consistent isn’t possible.
There are replication tools that use batch processes, and given enough time, the data across the two regions can be made consistent – this is called “eventual consistency.” In practice this means that data is generally inconsistent for most of the time. But for some business applications, this kind of eventual consistency is not good enough.
Niraj: I might add that existing technologies often require application downtime to ensure replication. In the age of always-on operations, that seems to be unrealistic. If customers attempt to use existing tools to replicate data across multi-regions or multi-cloud, the impact of system downtime might be simply too great. So perhaps hidden inside the assumption ‘it’s not possible’ is really the idea ‘it’s too complicated or risky.’
“In practice this means that data is generally inconsistent for most of the time. But for some business applications, this kind of eventual consistency is not good enough.”
DISCOtecher: What is behind the complexity, and why is the complexity increasing?
Mark: You may have heard about the 3 Vs – velocity, volume, and variety? This is the big data era and data sources continue to proliferate. When the data was all stored in internal systems, it was so much simpler. Now with hybrid cloud, there are so many factors across business, operations and economics that have changed - from network speeds to cloud provider access to costs. The traditional method has been to run migrations over weekends or during maintenance windows. The first challenge is that for 24/7 businesses, those routes simply no longer exist because the data is in constant use. Whether the migration is from on-premises to the cloud, or from one cloud to another, the same principle applies.
Additionally, the migration tends to be of data at a given point in time. For many enterprises, because operations re-start on the source data as soon as possible, a subsequent challenge is synchronizing the source (which has immediately changed) and migrated data sets. This applies for almost any source and target pair. There are replication tools that work exceptionally well within an on-premises landscape, and cloud vendors also provide abilities to synchronize data within their environments, but they usually struggle to handle a mix of on-premises and cloud, or a mix of multi-cloud landscapes. In essence, the challenge is to migrate data even as it changes, without interrupting business operations, and minimize risk, in a hybrid and multi-cloud world.
“There are replication tools that work exceptionally well within an on-premises landscape, and cloud vendors also provide abilities to synchronize data within their environments, but they usually struggle to handle a mix of on-premises and cloud, or a mix of multi-cloud landscapes.”
DISCOtecher: Got it. So that’s a conundrum for anyone who needs to execute a hybrid or multi-cloud strategy if they start from the viewpoint that multi-cloud, or multi-region / multi-cloud replication is not possible.
Mark: Yes. But we do know of businesses that are indeed attempting multi- and poly-cloud replication. As an example, adventurous enterprises with large internal IT resources are attempting to solve the problems with in-house solutions. We know of several cases where this has been successful, but only for a limited time. Before long, their solution has become outdated and new investments and development is required, leading to significant costs. Sometimes reluctantly, these enterprises have switched to commercial products with lesser functionality, principally for cost reasons.
Niraj: The operational goal these enterprises have is to ensure that all of their data is consistent and accessible at any point in hybrid and multiple cloud environments. WANdisco refers to this always-consistent, always-available data environment as ‘LiveData.’ For all data-dependent enterprises – and that is pretty much every business in some manner – creating a LiveData environment is critical.
WANdisco Fusion enables a LiveData capability, and this means that all of your application data stays accurate and consistent even when data is moving. A LiveData capability removes much of the complexity. WANdisco Fusion works on prem and in the cloud, hybrid cloud and cloud to cloud - in any environment - and addresses this challenge head on. It was designed to remove the risk!
“WANdisco refers to this always-consistent, always-available data environment as ‘LiveData.’ For all data-dependent enterprises — and that is pretty much every business in some manner — creating a LiveData environment is critical.”
DISCOtecher: Sounds easy enough - ‘buy cloud’ is step 1, and ‘get LiveData’ is step 2. How do some of our WANdisco Fusion customers take their first steps towards multi-cloud, multi-region data consistency and enable LiveData?
Mark: The first step is to understand and define the business problem, and how or why current technology is unable to help. We’re looking for the impact of data inconsistency, or the risks the business is facing. It could be as simple as having only one copy of critical business data, and the desire to replicate it to a second location, perhaps on a different cloud vendor’s platform. It might be a more complex need, with processes that are inhibited or stopped until source data is replicated, and more of a data agility challenge.
Niraj: Organizations that are aware of the risks, and are looking for solutions, tend to find that existing solutions do not meet their needs. They understand that a LiveData environment is the target, but they tend to be searching within known on-premises or cloud toolsets. Many assume that the cloud service providers’ data management SLAs are sufficient. But when you really delve under the hood, even a great cloud provider will struggle to restore petabyte-scale data sets within a business day, and there have already been examples of entire cloud regions all of sudden being unavailable. For almost any organization, that’s a huge risk.
DISCOtecher: Indeed! Could you please talk me through a LiveData use case?
Mark: Absolutely. We recently worked with a large manufacturer who wanted to massively accelerate its design innovation. Rather than build a second on-premises Hortonworks cluster, the customer chose AWS as its preferred platform, looking to exploit its machine learning capabilities.
The basic challenge was to migrate and manage more than 100 TB of critical data to take advantage of cloud-based analytics, and, of course, bring down their operational costs. How could the company move to the cloud without disrupting ongoing, critical business operations? How could it account for the delta between the data captured at a point in time and the subsequent source data changes made during the migration? How could it keep data replicated successfully – with limited connection capacity and significant latency not to mention 100 TB of data?
The company could not accept any downtime whatsoever during the migration, and could not risk systems outage or data loss if the migration failed.
The answer was to deploy WANdisco Fusion to copy its Hadoop data onto AWS Snowball, maintaining data consistency until the point of physical transfer to AWS S3. Once the data was imported to AWS S3, WANdisco Fusion then created a LiveData environment, continuously synchronizing ongoing changes of the on-premises Hadoop data to AWS S3. Now, WANdisco Fusion seamlessly manages updated data and replicates to AWS S3. Soon after this, the manufacturer wanted to also use the Azure cloud platform to enable their multi-cloud strategy and was able to replicate their data again from S3 to Azure Blob Storage. This is a LiveData platform!
It is very common for companies with multiple locations to have data residing on S3-compatible storage in different geographic regions. In any environment, enterprises want their data to be consistent for all users, regardless of the underlying topography. Similarly, they want the business to be resilient, which means a backup and disaster recovery solution across these regions, which implies using different cloud vendors for those services. The LiveData environment created by WANdisco Fusion meets all those needs for resilience, disaster recovery and consistent data for applications. Being able to guarantee consistent data across on-premises and multi-cloud environments is now a critical business requirement.
“It is very common for companies with multiple locations to have data residing on S3-compatible storage in different geographic regions. In any environment, enterprises want their data to be consistent for all users, regardless of the underlying topography. Similarly, they want the business to be resilient, which means a backup and disaster recovery solution across these regions, which implies using different cloud vendors for those services.”
DISCOtecher: So LiveData enables hyperscale infrastructure and a polycloud strategy?
Mark: Yes. WANdisco Fusion makes it possible to replicate to a very broad range of S3-compatible cloud storage providers and delivers freedom of vendor choice as well as remarkable business resilience.
Niraj: For organizations seeking to reduce risk while increasing capabilities, LiveData has emerged as a core concept. By making data consistent across all endpoints, even as it changes and even at petabyte scale, regardless of location or cloud service, WANdisco Fusion enterprise software provides the capability that businesses need to evolve their digital operations.
DISCOtecher: Niraj and Mark, thanks much for your time today! And to our readers, you can learn more about data resiliency in a multi-cloud environment in Wikibon’s recent research paper on this topic.
Recent Blog Posts
WANdisco LiveData Migrator will democratize and accelerate data lake migration to the cloud with zero downtime
By Van Diamandakis