What is Apache Hive?
Apache Hive is a data warehouse system built on top of Apache Hadoop that allows easy data querying, analysis and reporting of massive datasets distributed across various systems, file stores and databases, built with Hadoop.
It is designed to offer an abstraction that supports applications that want to use data residing in a Hadoop cluster in a structured manner, allowing ad-hoc querying, summarization and other data analysis tasks to be performed using high-level constructs, including Apache Hive SQL queries.
What is WANdisco LiveHive?
Consistent Hive metadata
The WANdisco Fusion Plugin for Live Hive extends the capabilities of WANdisco Fusion to allow your Hive infrastructure to participate fully in a LiveData platform. Give your Hadoop clusters a shared Hive metastore without the cost of single points of failure, degraded performance or administrative headaches. Replicate Hive metadata as it changes in any cluster, with strong consistency among all environments, and selective replication based on matching databases, tables and file system locations.
Always consistent queries
Share the same Hive definitions across multiple environments, regardless of where and when changes are made. Dramatically simplify the configuration of metadata replication with a LiveData platform, so that all applications have access to the same Hive tables wherever they are required.Read the docs
Guaranteed data consistency
Query your Hive data from any cluster with the same results everytime, everywhere. Ingest data, alter tables, create new Hive representations and maintain consistent results at all times.
Never worry about periods of time where Hive representations may differ among clusters because of periodic replication. Replicate your changes as they occur, without conflict among environments.
Recover from network or system outages automatically without the risk of introducing metadata inconsistencies. Accommodate your planned and unplanned outages with ease, and reduce administration costs.
Simple administration and integration
Extend an existing WANdisco Fusion deployment with the WANdisco Fusion Plugin for Live Hive without downtime or disruption. Take advantage of LiveData replication for Hive metadata without changing Hive applications or each cluster’s Hive metastore. Use simple replication rules to define which Hive databases, tables and file system locations are replicated with strong consistency.
- CDH 5.9+
- HDP 2.6+
- RHEL 6.1+ (x86-64)
- CentOS 6, 7 (x86-64)
- Ubuntu 12.04, 14.04 (x86-64)
- SLES 11+ (x86-64)
Apache Hive Replication Across Cloud Environments
The Live Hive Proxy is a WANdisco service that is deployed with Live Hive, acting as a proxy for applications that use a standalone Hive Metastore. The service coordinates actions performed against the Metastore with actions within clusters in which associated Hive metadata are replicated.
Below is a video demonstration of the WANdisco Fusion LiveHive plugin in action.