Automate migration of Hadoop data and Hive metadata to AWS without disruption or downtime
WANdisco strengthens AWS engineering partnership with integration between Data Migrator and AWS Glue Data Catalog
Move to AWS with ease and enable a hybrid AWS environment
WANdisco's partnership with AWS helps you to migrate data and metadata to the cloud rapidly and easily and exploit the power and capabilities of AWS services, including Amazon S3, Amazon EMR and AWS Glue Data Catalog. WANdisco is an AWS Advanced Tier ISV partner and one of the first ISVs to achieve AWS Migration Competency in Workload Mobility: Data Migration.AWS Marketplace Amazon EMR Migrations
AWS PARTNER NETWORK
Advanced Technology Partner
- Advanced tier ISV Migration competency
- Migration Acceleration Program (MAP) for Storage
- Amazon EMR Migration Program (EMP)
“We found WANdisco’s Data Migrator to be the optimal approach to deliver the best time to value, rather than running a more time-consuming and costly manual migration project internally.”
Data Migrator with AWS
Migrate on-premises HDFS to Amazon S3
WANdisco Data Migrator is a safe and reliable cloud migration solution that provides complete and continuous migration of HDFS data to AWS cloud.
Data Migrator is fully self-service requiring no WANdisco expertise or services. It is entirely non-intrusive and requires zero changes to applications, cluster or node configuration or operation. AWS customers looking to rapidly and successfully migrate their large-scale on-premises Hadoop data lake into the cloud may now turn to WANdisco for an automated data migration and replication solution with zero business downtime. WANdisco Data Migrator is the only platform that allows production applications on-premise to continue to operate while data is migrating and under active change.
Migrate Apache Hive to AWS Glue Data Catalog
An important requirement when modernizing legacy analytics workloads for the cloud is to keep business operating as normal by taking advantage of the metadata stored on-premises. Moving data to the cloud by replicating HDFS data to Amazon S3 using Data Migrator, is only the first step. You must also replicate the metadata to enable users to discover, understand and query the data.
To provide customers a complete migration solution, Data Migrator migrates metadata from Apache Hive directly to the AWS Glue Data Catalog. Data Migrator eliminates complex and error-prone workarounds that require one-off scripts and configuration in the Hive metastore, and integrates with a wide range of databases used by the Hive metastore making migration simple and painless.
What is AWS Glue Data Catalog
The AWS Glue Data Catalog is a persistent, Apache Hive compatible metadata store that can be used for storing information about different types of data assets, regardless of where they are physically stored. The AWS Glue Data Catalog holds table definitions, schemas, partitions, properties and more. It automatically registers and updates partitions to make queries run efficiently. It also maintains a comprehensive schema version history that provides a record for schema evolution.
The AWS Glue Data Catalog is a cloud-native, managed metadata catalog that is flexible, reliable, and usable from a broad range of AWS native analytics services, 3rd parties and open-source engines. AWS maintains and manages the service so that you do not need to spend time scaling as demands grow, responding to outages, ensuring data resilience or updating infrastructure.
Migration to Databricks
Data Migrator provides a comprehensive solution for migrating Hadoop data and Hive metadata, as well as the last mile migration to the format required by Delta Lake on Databricks. This enables users to manage the complete migration (HDFS to Databricks) using a single solution. Migrated data is immediately available for advanced Spark-based cloud analytics by Databricks on AWS.
Databricks enables companies to accelerate data-driven innovation with a unified approach to data analytics and AI. Leveraging Data Migrator to automate Hadoop data and Hive metadata migration directly to Databricks enables organizations to focus resources on development of new AI innovations rather than migration complexities enabling them to introduce new AI and ML capabilities much more quickly.