Are Old Lessons on Data Dependency Still Relevant?
Posted in Industry on Aug 01, 2018
The cloud has been incredibly effective at simplifying how applications are built and deployed by eliminating physical constraints. The vision of the software-defined datacenter has been fully realized. By turning the challenge of acquiring and provisioning hardware into a button press, the cloud has eliminated many barriers to new application deployment.
But the benefits that come with this ease of deploying and scaling systems also come with a cost. The “hidden gotcha” described by IDC’s Research Director, Phil Goodwin, refers precisely to a challenge that may not be accounted for by many enterprises.
"Distributed applications that the cloud makes simple to construct become interdependent because they may work only on pieces of a larger business process, introducing a need to share information."
That “data dependency” emerges directly from the ease of using the cloud to build systems that need to work on the same information.
Here’s a simple example of an application data dependency. A supply chain application will depend on product data that describes components. Changes to that master data by necessity need to occur in a product database, and will likely be queried by applications that use it for advance shipping notifications. For example, a modification to the weight of a product must be used by applications that drive receipt in distribution centers, packing and shipping calculations. Each of those applications is dependent on that product data, and will access it as a service.
"Updates to a single piece of product information can have significant follow-on effects for supply chain planning, scheduling and costing, and the applications that control each of those aspects will be impacted by any lack of availability of the data, or inconsistencies in it."
"What this examples shows us is that organizations must ensure their business goals are not ignored when working with the cloud. Achieving these goals should always be front of mind, and business processes should assume that the technology foundations on which they can be built will change, fail and maybe even actively conspire to subvert those goals."
But processes can become dependent on the health and utility of the platforms on which they operate. Because of this dependency, organizations cannot ignore the unique needs of systems that rely on one another.
In a sense, everything that was old is new again. The “fallacies of distributed computing” (https://en.wikipedia.org/wiki/Fallacies_of_distributed_computing) referred to eight misconceptions that still affect cloud-native systems.
"IDC’s notion of data dependency is a clear reminder that the same issues remain with modern infrastructure, and still need answers when designing and deploying systems in the cloud. Networks are unreliable, bandwidth is limited, there will be more than one administrator, and moving data is not free. "
Distributed systems (which includes all cloud applications) are more complex than non-distributed ones, and delegating the response to that complexity to applications alone introduce further cost and complexity. As we get better at building distributed systems at scale, we can forget these principles. Just remembering them isn’t enough, you need infrastructure that simplifies the challenges of data dependency.
The foundations for a response to these challenges, including data dependency, are already available, and can use provenly optimal approaches to the coordination of distributed processes. But while distributed systems theory is one thing, its practice is another, and the application of these approaches in itself can be complex. This means that trusted, productized implementations are required.
A LiveData platform is WANdisco’s contribution to the inherent complexities of using the cloud.
"A LiveData platform makes it possible for distributed applications to access, share and modify local replicas of the same data while guaranteeing it remains consistent."
Doing this at scale is difficult, but a problem that WANdisco solves. While there are other approaches to the challenge of data dependency, the benefits of a LiveData platform are being proven at scale, with forward-thinking organizations using it in ways that simplify rather than complicate. Having ready access to all data at any scale at local speed can be a competitive advantage.
"LiveData is used today by major financial institutions, cloud providers, automotive manufacturers and technology companies as an answer to their applications’ interdependency on data. By separating how data can be made available to applications from the way those applications are distributed, it solves problems that were previously insurmountable at scale."
- By allowing a multi-petabyte data lake to span continents, a global bank has made it possible to service itinerant customers without the risk of site failure affecting their access to services
- An automotive manufacturer can process hundreds of terabytes of data a day in multiple locations, all while providing aggregated statistical information across what were separate data sets
- An online retailer can maintain accurate inventory information in real-time across their entire supply chain.
A LiveData platform combines distributed consensus with data replication, supporting selective replication of massive data sets at scale, without the need to disrupt application operation while data consistency is maintained. Changes made anywhere affect the logical copies of data held in each required location, and applications can interact with their local data without adverse impact from the latency between locations. This makes the approach uniquely well-suited for use in the cloud because it can overcome the loss of direct control over physical network and datacenter resources.
Used for cloud-native, ground-to-cloud, and wholly-isolated solutions on-premises, a LiveData platform provides a compelling foundation to take advantage of the cloud without falling prey to the hidden gotcha of data dependency.