Course Overview
TOPThis four day administrator training for Apache Hadoop provides participants with a comprehensive understanding of all the steps necessary to operate and maintain a Hadoop cluster. From installation and configuration through load balancing and tuning, Cloudera's training course is the best preparation for the real-world challenges faced by Hadoop administrators.
Scheduled Classes
TOPWhat You'll Learn
TOP- The internals of YARN, MapReduce, and HDFS
- Determining the correct hardware and infrastructure for your cluster
- Proper cluster configuration and deployment to integrate with the data center
- How to load data into the cluster from dynamically generated files using Flume and from RDBMS using Sqoop
- Configuring the FairScheduler to provide service-level agreements for multiple users of a cluster
- Best practices for preparing and maintaining Apache Hadoop in production
- Troubleshooting, diagnosing, tuning, and solving Hadoop issues
Outline
TOP
Viewing outline for:
- The Case for Apache Hadoop
- Why Hadoop?
- Core Hadoop Components
- Fundamental Concepts
- HDFS
- HDFS Features
- Writing and Reading Files
- NameNode Memory Considerations
- Overview of HDFS Security
- Using the Namenode Web UI
- Using the Hadoop File Shell
- Getting Data into HDFS
- Ingesting Data from External Sources with Flume
- Ingesting Data from Relational Databases with Sqoop
- REST Interfaces
- Best Practices for Importing Data
- YARN and MapReduce
- What Is MapReduce?
- Basic MapReduce Concepts
- YARN Cluster Architecture
- Resource Allocation
- Failure Recovery
- Using the YARN Web UI
- MapReduce Version 1
- Planning Your Hadoop Cluster
- General Planning Considerations
- Choosing the Right Hardware
- Network Considerations
- Configuring Nodes
- Planning for Cluster Management
- Hadoop Installation and Initial Configuration
- Deployment Types
- Installing Hadoop
- Specifying the Hadoop Configuration
- Performing Initial HDFS Configuration
- Performing Initial YARN and MapReduce Configuration
- Hadoop Logging
- Installing and Configuring Hive, Impala, and Pig
- Hive
- Impala
- Pig
- Hadoop Clients
- What is a Hadoop Client?
- Installing and Configuring Hadoop Clients
- Installing and Configuring Hue
- Hue Authentication and Authorization
- Cloudera Manager
- The Motivation for Cloudera Manager
- Cloudera Manager Features
- Express and Enterprise Versions
- Cloudera Manager Topology
- Installing Cloudera Manager
- Installing Hadoop Using Cloudera Manager
- Performing Basic Administration Tasks Using Cloudera Manager
- Advanced Cluster Configuration
- Advanced Configuration Parameters
- Configuring Hadoop Ports
- Explicitly Including and Excluding Hosts
- Configuring HDFS for Rack Awareness
- Configuring HDFS High Availability
- Hadoop Security
- Why Hadoop Security Is Important
- Hadoop's Security System Concepts
- What Kerberos Is and How it Works
- Securing a Hadoop Cluster with Kerberos
- Managing and Scheduling Jobs
- Managing Running Jobs
- Scheduling Hadoop Jobs
- Configuring the FairScheduler
- Impala Query Scheduling
- Cluster Maintenance
- Checking HDFS Status
- Copying Data Between Clusters
- Adding and Removing Cluster Nodes
- Rebalancing the Cluster
- Cluster Upgrading
- Cluster Monitoring and Troubleshooting
- General System Monitoring
- Monitoring Hadoop Clusters
- Common Troubleshooting Hadoop Clusters
- Common Misconfigurations
Prerequisites
TOP- Required: This course is designed for system administrators and IT managers who have basic Linux systems administration experience. Prior knowledge of Hadoop is not required.
- Recommended: CompTIA Linux+ Certification Prep Powered by LPI
Who Should Attend
TOPSystem administrators and others responsible for managing Apache Hadoop clusters in production or development environments.