Apache Hive & Pig for Developers

Course Details
Duration: 2 days
Audience: Developers
Audience: Up to 16 students
Price: $14,995
Location: WANdisco training center, or customer site
This class can also be taught via GotoMeeting split into two 3.5-4 hour sessions.

This course is designed for Analysts who have attended the Hadoop Overview course.

Course Pre-Requisites:

Familiarity with SQL or a scripting language is required. Some understanding of Hadoop is required.

Course Outline:

Day One - Hive:

  • Overview of Hadoop
    • Big Data and the Distributed File System
    • MapReduce
  • Hive Introduction
    • Why Hive?
    • Compare vs SQL
    • Use Cases
  • Hive Architecture – Building Blocks
    • Hive CLI and Language (Exercise)
    • HDFS Shell
    • Hive CLI
    • Data Types
    • Hive Cheat-Sheet
    • Data Definition Statements
    • Data Manipulation Statements
    • Select, Views, GroupBy, SortBy/DistributeBy/ClusterBy/OrderBy, Joins
    • Built-in Functions
    • Union, Sub Queries, Sampling, Explain
  • Hive Usecase implementation - (Exercise)
    • Use Case 1
    • Use Case 2
  • Best Practices
  • Advance Features
    • Transform and Map-Reduce Scripts
    • Custom UDF
    • UDTF
    • SerDe
  • Recap and Q&A

Day Two - Pig

  • Pig Introduction
    • Position Pig in Hadoop ecosystem
    • Why Pig and not MapReduce
    • Simple example (slides) comparing Pig and MapReduce
    • Who is using Pig now and what are the main use cases
  • Pig Architecture
    • Discuss high level components of Pig
  • Pig Grunt - How to Start and Use
  • Pig Latin Programming
    • Data Types
    • Cheat sheet
    • Schema
    • Expressions
    • Commands and Exercise
    • Load, Store, Dump, Relational Operations, Foreach, Filter, Group, Order By, Distinct, Join, Cogroup, Union, Cross, Limit, Sample, Parallel
  • Use Cases (working exercise)
    • Use Case 1
    • Use Case 2
    • Use Case 3 (compare pig and hive)
  • Advanced Features, UDFs
  • Best Practices and common pitfalls
  • Recap and Q&A

Register interest in training: