AltoStor is a small software company based in Silicon Valley, California with unique skills in the field of Big Data and specifically Apache Hadoop. The acquisition of AltoStor and the subsequent hiring of co-founders Dr. Konstantin Shvachko and Jagane Sundar will significantly enhance WANdisco's ability to enter the Big Data market with the development of new products that incorporate WANdisco's patented replication technology DConE.

Executive Summary

I believe that the combination of AltoStor's expertise and WANdisco's patented active-active replication technology is the proverbial 'marriage-made-in-heaven'. The AltoStor acquisition will enable us to launch products into the highly lucrative Big Data / Hadoop market early next year.

Altostor illustration

So how lucrative is this market? Well, I recently read an interesting article in Wikibon "Big Data: Hadoop, Business Analytics and Beyond" that reiterated what we already knew. Big Data isn't a might happen next year thing. No, it's here today, to steal a quote from the excellent article: "Make no mistake: Big Data is the new definitive source of competitive advantage across all industries. Enterprises and technology vendors that dismiss Big Data as a passing fad do so at their peril and, in our opinion, will soon find themselves struggling to keep up with more forward-thinking rivals... For those organizations that understand and embrace the new reality of Big Data, the possibilities for new innovation, improved agility, and increased profitability are nearly endless."

So why did we acquire AltoStor?

First off, the founders (Dr. Konstantin Shvachko and Jagane Sundar) are really good guys. This was an 'old-school' acquisition. An initial deal was struck very quickly with a handshake. Both sides could see very clear value and so doing the deal was incredibly simple. I love the fact that they wanted stock as consideration and that's real proof that they see significant long term-value creation rather than short-term gain.

For WANdisco Big Data is a Big Market. We can see clear synergy between our unique / patented active-active replication technology and the creation of Hadoop high availability (HA) solutions. This is one of the reasons why AltoStor was so attractive to us. They have unique knowledge in the space:

Hadoop logo
  • The AltoStor founders have been working on Hadoop since its inception in 2006 at Yahoo. Konstantin was part of Doug Cuttings team that created and implemented Hadoop. His focus was massive scale, performance and availability of Hadoop and developing the Hadoop Distributed File System (HDFS). He then went on to eBay where he implemented Hadoop.
  • The Founders are intimately aware of the problem WANdisco is planning to solve around Hadoop HA and hence understand the value of the solution in large scale Big Data replication over a Wide Area Network.
  • Finally, AltoStor are developing a product that is slated for release in Q1 2013, that will significantly simplify deployment of Hadoop / Big Data for enterprises.

Following the acquisition we now expect to have products available in the first quarter of 2013. That's very good news.

There's going to be a lot of noise in this space over the coming months and years. Many will jump on the 'bandwagon', making all sorts of lavish claims to be 'the big data this' and 'the big data that'. It always happens in hype-cycles like this. In reality most are just companies repurposing existing legacy products and slapping a new label on it. This is NOT one of those. We are building from the ground-up with unique knowledge and information that only a few in the world have (the amount of brain-power in the room during some of the early design meeting was frightening!)

In 2005 when we founded WANdisco my peers would tell me that active-active replication over a Wide Area Network was impossible. Well we've got hundreds-of-thousands of users using the technology for core development every day. Applying this technology to Hadoop is groundbreaking and I think it will change the way the industry views network storage. We like making the impossible possible at WANdisco.

What is the rationale for this acquisition?

The AltoStor acquisition will allow WANdisco to apply its patented active-active replication technology and expertise in open source software to deliver products and services for the rapidly growing big data market. Even the most conservative estimates forecast big data to grow at over 50% per year between 2012 and 2017, from a market size of $5 billion to over $50 billion annually (Source: Wikibon 2012). This represents an enormous opportunity for WANdisco, as well as enterprise users of Apache Hadoop (Hadoop), who will soon be able to take advantage of new products that will enable them to experience zero downtime and zero data loss.

What is Big Data?

Big data is often defined as data sets so large that commonly used database management tools and systems are unable to capture, manage, and process them within an acceptable timeframe.

Google originally developed the principles underlying big data to enable indexing of the massive amounts of information available on the Internet to provide users with meaningful search results with quick response times.

The ability to manage and process the massive amounts of data generated every day from a variety of sources, such as social media sites, digital images, videos, sales transactions, medical data, climate data and cell phone GPS signals, is rapidly becoming the basis of competitive advantage across all industries.

What is Hadoop?

Hadoop is a free, Java-based programming framework that supports the processing of large data sets in a distributed computing environment. It is an open source project under the umbrella of the Apache Software Foundation and its open source license Apache v2.

Hadoop was originally conceived on the basis of Google's MapReduce, in which an application is broken down into numerous small parts. Any of these parts (also called fragments or blocks) can be run on any node (server) in the cluster. Hadoop makes it possible to run applications on systems with thousands of servers involving thousands of terabytes.

Hadoop is used by major organizations including; Facebook, Yahoo, eBay, Google, and IBM, largely for applications involving search engines and advertising. Hadoop was originally the name of a stuffed toy elephant belonging to a child of the framework's creator, Doug Cutting.

Why did WANdisco choose AltoStor?

WANdisco chose AltoStor because it believes the company is the best choice to enable the rapid launch of products and services into the hyper-growth, highly lucrative big data market as early as the first quarter of 2013. The unique skills the founders of AltoStor bring, the products they currently have under development, and the application of WANdisco's patented replication technology, are all factors that combine to make this possible.

In addition, hiring founders of the Hadoop project, who played leading roles in the implementation of Hadoop at eBay and Yahoo are considered a 'coup' and from the market's perspective. The other companies involved in the core development of Hadoop and the delivery of related products and services include Facebook, Twitter, Yahoo, Microsoft, or well funded start-ups such as Jive, Cloudera and Hortonworks.

Finally, as Hadoop moves from primarily batch-based implementations (complex analytics, recommendation engines, and sentiment analysis) to high volume transactional systems such as those used in the financial services industry, the 24-by-7 availability and real-time performance WANdisco's HA solutions deliver will be seen as critical.

What product and services can we expect to see from WANdisco?

In the big data arena specifically, WANdisco aims to be the best solution Apache Hadoop Enterprise deployment. WANdisco will be the only choice for Hadoop HA that truly has no single-point-of-failure, with products that work over a WAN across data centers as well as over a LAN within a single data center.

We will also introduce the WANdisco AltoStor Appliance for easy plug-and-play Hadoop deployment and administration. WANdisco's AltoStor Appliance will be the only appliance available for Hadoop that supports the Amazon EC2 S3 API. This will enable easy migration from public clouds where new big data applications are often developed and tested, to private clouds behind corporate firewalls where they are frequently deployed in production to protect sensitive data.

In addition, WANdisco is considering offering its own Hadoop binaries for free download, as well as enterprise-class support, training and consulting services as it has successfully done in the Subversion marketplace.

The company will also be offering free, one-hour Hadoop training webinars starting in mid-January 2013.

When will these products and services be available?

WANdisco is planning to roll out its big data products and services during the first quarter of 2013.

Who are the target customers?

Enterprises in virtually any industry will benefit from these solutions, and the number of use cases will be virtually infinite. Examples of how big data is used today in various industries include:

  • Financial services - Banks analyze patterns within data and documents to determine the likelihood of fraud and take action before it occurs.
  • Healthcare - Doctors determine patterns of treatment that provide the most desirable outcomes using years of historical patient data from multiple sources.
  • Manufacturing - Automakers analyze data from the factory floor to see the cause of production delays, and modify processes to overcome them.
  • Utilities - Massive amounts of consumer data can be analyzed to determine usage patterns and deploying smart grid technologies to address peak usage and limit power outages.

Apache and Hadoop are trademarks of the Apache Software Foundation