Rethinking the Future of Hadoop
By Van Diamandakis
Nov 12, 2020
By Van Diamandakis, SVP Marketing, WANdisco
Earlier this year, in an article in Toolbox Tech, we offered our take on the subject of Hadoop, concluding that “Hadoop will have its rightful place in the big data ecosystem. But for dynamic and fast-moving business landscapes, data management is going to be cloud-dominated, and organizations need to be planning their transitions today.”
Market research and analysis firm Gartner recently published a research report titled “Choosing the Right Path When Exploring Hadoop’s Future,” in which they weigh in on many of the issues we discussed.
On the subject of Hadoop’s place in the enterprise of the future, for example, Gartner concluded that “Hadoop is still a viable choice for a broad set of use cases. These include data ingest, data lake construction, data science and exploration, data hubs, and combining new data with subsets of data from a data warehouse where multiple sources, often unstructured and semistructured data, are used.”
Yet, we’ve long identified deep disappointment in many organizations with Hadoop projects. On this subject, Gartner notes that:
“…with more deployment experience, users have found that there are many use cases that Hadoop does not effectively address or require a large amount of management and implementation resources. Many organizations interviewed by Gartner in use-case analysis and in inquiries have reported expectations that Hadoop-based systems would deliver more functionality and value than the eventual implementation produced — perhaps due to unrealistic expectations and overhyped vendor marketing.”
Here’s what else the report had to say, and our take on it.
On Moving to the Cloud
We’ve maintained that for dynamic and fast-moving businesses, the future of data management is in the cloud, and that planning for migration needs to start the day before yesterday. The shift away from Hadoop is part of an ongoing transition away from outmoded tech paradigms – away from on-prem storage and billions of batch-based queries to real time analytics over massive cloud-based datasets.
In the report, Gartner notes that “Cloud is clearly driving much of the market growth for DBMS, and cloud deployment for Hadoop is an example of extending these new choices into additional data management uses. Moving to the cloud creates opportunities to reduce some of the complexities and skills demands described above. Moving also provides opportunities to more precisely align technologies to requirements, since the entire point of the cloud is to offer capabilities delivered as a service, while hiding the complexities of implementation.”
Clearly, the cloud’s inherent elasticity comes into play, too. Hadoop in the cloud offers unprecedented resource elasticity – making it a natural fit for unpredictable resource usage scenarios, including the rapid growth that generally follows an initial deployment. This separation of storage from computing grants greater flexibility and power to data consumers – who can now run multiple compute clusters on the same data.
The Cloud for Efficiency and Lower Costs
The move to the cloud is strategic. In today’s challenging business climate this move reflects a growing corporate sensitivity to agility, cost optimization, and the synergies inherent in application and platform consolidation.
Gartner advises that “…moving to the cloud is often associated with corporate initiatives around agility, cost optimization, and the synergies possible in moving multiple applications and platforms to a new environment. Using cloud versions of the Hadoop stack — sometimes including alternative components offered by the CSP — permits more cost-effective usage, streamlined operations and better targeting of existing skill sets.”
We’ve seen storage costs, too, go down in the move from Hadoop to the cloud. The reason? The costs for on-prem storage management are built into Cloud Service Provider (CSP) packages. Gartner says that:
“The cloud provides some inherent advantages over on-premises software, such as resource elasticity. This makes the cloud ideal for scenarios where resource usage is constantly variable in unpredictable ways. It is also helpful in the initial development and deployment of any system, since rapid growth in this initial phase is standard. Separating storage from compute facilitates this benefit. Instead of coupling the two together, as was done in Hadoop on-premises deployments, cloud users can easily spin up multiple different compute clusters to run against the same data. Storage costs are more predictable, while the expense of managing on-premises storage is moved to the CSP and embedded in the price.”
The Migration Consideration
Most interesting for us, of course, was the Gartner report’s take on migration of Hadoop data lakes to the cloud. The cost of migration – and not just the direct monetary cost – is crucial (italics added):
“The feasibility of any migration effort is always the result of a simple calculation — comparing the cost of the migration to the improved value produced by the end target of the migration. It may make sense to move a Hadoop-based system simply to gain the general advantages of the cloud: its elasticity, flexible usage models and the potential for price savings. The overhead of managing the system will also be less than managing the system on premises, although mature on-premises systems may be fairly stable in their management requirements, reducing the potential for an improvement in this area.”
“Consider the costs of the migration itself. Performing the migration may create disruption in operations and challenges in ensuring the latest data is in use, and it may require significant skills in planning and executions.”
The Bottom Line
The consolidation in the Hadoop vendor market, and disappointment with Hadoop deployments are driving a tectonic shift in the role of the on-prem data lake
We are seeing the management flexibility and inherent elasticity of the cloud, enabling entirely new use cases – which were previously unobtainable with on-prem Hadoop.
Gartner recommends that companies:
“Evaluate existing and proposed uses for Hadoop technology in light of deployment experience, and set realistic expectations. Ensure that current, appropriately configured components are mapped to the use cases, including a discussion with your vendor about expectations that are not being met.”
“Compare benefits offered by cloud migration to migration costs including data transfer, app migration, and update requirements to qualify opportunities.”
“Use cloud object stores whenever possible for new use cases due to their lower management overhead and flexible expandability.”
We believe that data and analytics stakeholders need to realistically rethink the place Hadoop holds in their data ecosystem. The move to cloud-based data management and analytics is already upon us – and it is no longer a question of “if” but rather “how.”