How to accelerate cloud adoption
By WANdisco , Sep 28, 2018 in Tech & Trends
In a new blog series focused on WANdisco Fusion and a LiveData capability, we explore LiveData use cases shared by subject matter experts. The first of this series focuses on cloud migration. While outage after outage grab international headlines, many enterprises feel blocked at executing their hybrid cloud and multi-cloud strategies because many IT teams don’t have the tools and resources to tackle the data accessibility and consistency challenges associated with cloud data movement. With both a WANdisco Fusion product and IT Architect perspective, we spoke with Tom Luckenbach, a WANdisco Solutions Architect, who shares how a LiveData platform accelerates cloud adoption.
DISCOtecher: Why is enabling a LiveData environment so critical for success in shifting to a cloud strategy?
Tom: There are really two angles to think about this from: your initial migration to cloud, and then your subsequent cloud operations.
For the initial stages of cloud adoption, the most likely route is migration of data to build a so-called hybrid cloud. A subset of your data for the target application is moved from on-premises to cloud. But as soon as you split off your data, this is when the data management challenges arise. The chief concern is how to keep all these data sets in sync if they are both in use. WANdisco LiveData eliminates this “split-brain” problem.
As cloud adoption progresses and now becomes a larger operational component of the IT landscape, very few enterprises will rely on just a single cloud vendor. A multi-vendor cloud strategy makes sense for the same reason today’s companies operate multi-vendor on-premises infrastructures: to spread risk and encourage competition. However, moving the data between multiple cloud vendors creates an enormous challenge. Again, WANdisco with LiveData provides a common solution, bridging cloud platforms.
“A multi-vendor cloud strategy makes sense for the same reason today’s companies operate multi-vendor on-premises infrastructures: to spread risk and encourage competition.”
DISCOtecher: Yes, makes sense. But a lot of enterprises are stuck at the cloud transition divide. Why don’t cloud providers address enterprise data consistency requirements?
Tom: Yes, that’s right. The cloud vendors are not very motivated to provide data sharing tools. Cloud providers are great when it comes to the one-way data upload. And, of course, excellent at showing you the advantages of cloud-based apps. In fact, switching to cloud services has truly clarified the boundaries between data and apps, meaning we often see a much clearer boundary of the data vs the app. Even databases which traditionally obfuscate the notion of data, can leverage these new data platforms – cloud object storage. This encourages enterprises to look for the best application offerings, and then move the data to that cloud provider.
As a result, enterprises want their data to become mobile accessible – which also puts emphasis on availability. Traditionally, the migration process essentially meant taking a point-in-time copy and moving the data set into production. However, once one uses the copy, the newly changed data is now out of sync with the original. The amount of data we are talking about now, means it can no longer be handled by using the tactic of wholesale replacement of the underlying data.
Again this is where WANdisco Fusion and LiveData comes to the fore and completely changes the status quo. With LiveData, you replace point-in-time copies with always-synchronized data at all endpoints. When you ‘migrate’ data to the cloud or to a second cloud vendor, a LiveData environment ensures that the data on the all platforms – new and old - remain completely synchronized.
For companies such as financial institutions, retail, and healthcare organizations – really any enterprise with hybrid or multi-vendor cloud operations – the LiveData capability simplifies their data model by instituting a ‘same-data-everywhere’ principle. This principle harks back to the “single source of truth” mantra we have held dear in the enterprise landscape for so many years.
DISCOtecher: If migrating to the cloud is a data management challenge, surely there are sufficient tools and methodologies already. What LiveData capabilities do cloud providers support currently?
Tom: There’s only one LiveData platform, and that’s from WANdisco. The other tools out there used for managing enterprise data on-premises are not necessarily suitable or adaptable to handle the volumes of unstructured data found in cloud use-cases. As I pointed out already, data management tools within each cloud vendor’s ecosystem are also generally designed to handle both hybrid and multi-cloud scenarios.
In any event, functionality tends to be focused on uni-directional data flow - moving data from one entity to another. Enterprises will move to a mix of on-premises, private and public cloud – a veritable mesh of cloud data centers and providers. And for this very common scenario we need ways to unite data, as the basic uni-directional tools are not sufficient.
For this kind of mesh network, multi-directional data flows are normal and expected. To embrace a LiveData platform where data is continuously in sync, new thinking and processes are needed. For many organizations this a significant gap, mostly born out of lacking awareness or experience.
“For companies such as financial institutions, retail, and healthcare organizations – really any enterprise with hybrid or multi-vendor cloud operations – the LiveData capability simplifies their data model by instituting a ‘same-data-everywhere’ principle.”
DISCOtecher: When we think of what LiveData means, is it right to think this is essentially shared pools of data, available to any application?
Tom: You’re on the right track. LiveData borrows an attribute from shared file technology called ‘global name space,’ which provides low-latency access via caching at any of several localized data stores. With the rise of distributed computing across the internet, cloud purported to be a global culmination of these approaches. However, important aspects of our wonderful constructs fell to the way side. We are left with impractical data silos, manifesting for example as the departmental data mart or as the discrete and separate cloud regional instance.
The haste of getting to the cloud, has left many organizations with comprises that are no longer tenable. What we’re seeing now is the need to go back a unified model, to eliminate the discrepancies that operating multiple data stores produces. But you can’t expect everybody around the globe to have equal access to a single point; network latency alone makes it improbable, so you need to store synchronized copies close to user locations – this is LiveData landscape.
The technical challenge is to avail the local users with data in such a way that keeps it fully consistent and coordinated with all the other locations. So if there are changes, modifications, or additions they are propagated to all the other locations keeping all data up-to-date.
DISCOtecher: Can you share a common enterprise LiveData use case with me?
Tom: Sure thing. A classic, current example is a large graphics-chip maker that wanted to massively accelerate its design innovation. With around 100 TB of critical data, how could it take advantage of cloud-based analytics? Moving that quantity of data as point-in-time snapshots was not practical, as it would cause big internal delays and yet the cloud data would be out-of-date as soon as it arrived. In this particular case, the company could not afford extended – or even any – downtime during the migration, and could not risk systems outage or data loss if the migration failed.
The answer was to create a LiveData environment with a few specific wrinkles. The company selected Microsoft Azure ADLS. With limited corporate bandwidth, the first step was to use WANdisco Fusion to synchronize the data with an Azure Data Box – a special-purpose physical server that is trucked to Azure. Once Microsoft had uploaded the data from the Box, WANdisco Fusion continuously synchronized ongoing changes of the on-premises Hadoop data to Hadoop on Azure, ensuring that the delta between the original transfer and the Azure load was covered.
The company now uses this LiveData environment for its cloud analytics. Any local changes are immediately and continuously synchronized with ADLS, removing a whole slew of bandwidth, data consistency and operational impact issues – and, most importantly, giving them cloud-scale analytics on current data.
“Enabling a LiveData environment resolves those latency and location challenges by removing the need to move or copy data, and providing a continuously synchronized resource for all application endpoints.”
Tom: Data migration is the most frequent use case, which then leads to the more general LiveData environment. Data is often structured in a way that makes copying from point A to point B unreliable and slow. Most tools are designed to solve these single-migration situations. But they do not address the larger picture – or literally the larger datasets, comprised of unstructured files of various size and type. Migration of such large datasets require a lot of time which often means prolonged system downtime. You can no longer just say, “OK, I’m going to make a copy, move it over, and two hours from now we’ll start back up.”
Similarly, the ability to bring up a new application in the cloud isn’t just predicated on having data present. The data must be tested and validated, and the solution architecture must be similarly tested and validated, and for both of those processes to make sense, they require consistent data. In this use case, the LiveData concept provides the environment that ensures continuous data consistency, ensuring that new applications run against valid data and can produce results that provide meaningful comparisons.
Another driver is geographic location. Even if data is stored ‘in the cloud,’ it exists on physical infrastructure in a data center. This means that if you bring up a service in the eastern part of the US for users in Europe, the data does not move, creating issues around locality and latency. Resolving those issues by moving or copying data immediately introduces new data consistency challenges. Enabling a LiveData environment resolves those latency and location challenges by removing the need to move or copy data, and providing a continuously synchronized resource for all application endpoints.
DISCOtecher: To wrap things up, just one final question to check my understanding. Is it correct to say that WANdisco Fusion was developed specifically to solve the challenges around creating a LiveData environment for distributed environments?
Tom: That’s exactly right. Ensuring that your application data stays consistent, available, and accessible across all your application environments is complex, especially in the cloud - but it doesn’t have to be. WANdisco Fusion removes complexity and risk by enabling data consistency and accessibility in any environment. With LiveData, whether over wide area networks, the internet, or with multi-cloud networks, data is guaranteed to be available, accurate, and protected - anytime, anywhere, spanning platforms and locations, even for data changing at petabyte scale.
DISCOtecher: Thanks very much for the super interview, Tom.
Tom: You’re most welcome!