logo


your one source for IT & AV

Training Presentation Systems Services & Consulting Cloud Services Purchase Client Center Computer Museum
Arrow Course Schedule | Classroom Rentals | Student Information | Free Seminars | Client Feedback | Partners | Survey | Standby Discounts

Cloudera Developer Training for Apache Hadoop

SS Course: 9000351

Course Overview

TOP

You will learn to build powerful data processing applications in this course. You will learn about MapReduce, the Hadoop Distributed Files System (HDFS), and how to write MapReduce code, and you will cover best practices for Hadoop development, debugging, and implementation of workflows. 

                                                                  

Scheduled Classes

TOP

What You'll Learn

TOP
  • The internals of MapReduce and HDFS and how to write MapReduce code
  • Best practices for Hadoop development, debugging, and implementation of workflows and common algorithms
  • How to leverage Hive, Pig, Sqoop, Flume, Oozie, and other Hadoop ecosystem projects
  • Creating custom components such as WritableComparables and InputFormats to manage complex data types
  • Writing and executing joins to link data sets in MapReduce
  • Advanced Hadoop API topics required for real-world data analysis

Outline

TOP
Viewing outline for:
  1. Motivation for Hadoop
    1. Problems with Traditional Large-Scale Systems
    2. Requirements for a New Approach
  2. Hadoop: Basic Concepts
    1. Hadoop Distributed File System (HDFS)
    2. MapReduce
    3. Anatomy of a Hadoop Cluster
    4. Other Hadoop Ecosystem Components
  3. Writing a MapReduce Program
    1. MapReduce Flow
    2. Examining a Sample MapReduce Program
    3. Basic MapReduce API Concepts
    4. Driver Code
    5. Mapper
    6. Reducer
    7. Streaming API
    8. Using Eclipse for Rapid Development
    9. New MapReduce API
  4. Integrating Hadoop into the Workflow
    1. Relational Database Management Systems
    2. Storage Systems
    3. Importing Data from a Relational Database Management System with Sqoop
    4. Importing Real-Time Data with Flume
    5. Accessing HDFS Using FuseDFS and Hoop
  5. Delving Deeper into the Hadoop API
    1. ToolRunner
    2. Testing with MRUnit
    3. Reducing Intermediate Data with Combiners
    4. Configuration and Close Methods for Map/Reduce Setup and Teardown
    5. Writing Partitioners for Better Load Balancing
    6. Directly Accessing HDFS
    7. Using the Distributed Cache
  6. Common MapReduce Algorithms
    1. Sorting and Searching
    2. Indexing
    3. Machine Learning with Mahout
    4. Term Frequency
    5. Inverse Document Frequency
    6. Word Co-Occurrence
  7. Using Hive and Pig
    1. Hive Basics
    2. Pig Basics
  8. Practical Development Tips and Techniques
    1. Debugging MapReduce Code
    2. Using LocalJobRunner Mode for Easier Debugging
    3. Retrieving Job Information with Counters
    4. Logging
    5. Splittable File Formats
    6. Determining the Optimal Number of Reducers
    7. Map-Only MapReduce Jobs
  9. Advanced MapReduce Programming
    1. Custom Writables and WritableComparables
    2. Saving Binary Data Using SequenceFiles and Avro Files
    3. Creating InputFormats and OutputFormats
  10. Joining Data Sets in MapReduce
    1. Map-Side Joins
    2. Secondary Sort
    3. Reduce-Side Joins
  11. Graph Manipulation in Hadoop
    1. Graph Techniques
    2. Representing Graphs in Hadoop
    3. Implementing a Sample Algorithm: Single Source Shortest Path
  12. Creating Workflows with Oozie
    1. Motivation for Oozie
    2. Workflow Definition Format

Prerequisites

TOP
  • Recommended: Java OCA & OCP Accelerated
  • Recommended: Knowledge of Hadoop is not required
  • Recommended: Some programming experience (preferably Java)

    Who Should Attend

    TOP

    Developers who need to write and maintain Apache Hadoop applications.

    Next Step Courses

    TOP