Hadoop Internals

( Duration: 3 Days )

In Hadoop Internals training course, Participants will gain a comprehensive understanding of all the steps necessary to operate and maintain a Hadoop cluster, covering topics from installation and configuration through load balancing and tuning, this course is the best preparation for the real-world challenges faced by Hadoop administrators.

Hadoop Internals course covers concepts addressed on the Cloudera Certified Administrator for Apache Hadoop (CCAH) exam.

By attending Hadoop Internals workshop, Participants will learn:

  • The internals of MapReduce and HDFS and how to build Hadoop architecture
  • Proper cluster configuration and deployment to integrate with systems and hardware in the data center
  • How to load data into the cluster from dynamically generated files using Flume and from RDBMS using Sqoop
  • Configuring the Fair Scheduler to provide service-level agreements for multiple users of a cluster
  • Installing and implementing Kerberos-based security for your cluster
  • Best practices for preparing and maintaining Apache Hadoop in production
  • Troubleshooting, diagnosing, tuning, and solving Hadoop issues

Hadoop Internals class is designed for system administrators and IT managers who have basic Linux systems administration experience. Prior knowledge of Hadoop is not required.

System administrators and others responsible for managing Apache Hadoop clusters in production or development environments.



Hadoop Introduction

  • Move computation not data
  • Hadoop performance and data scale facts
  • Hadoop in the context of other data stores
  • The Apache Hadoop Project
  • Hadoop - an inside view: MapReduce and HDFS
  • The Hadoop Ecosystem
  • What about NoSQL?
  • Comparison with Other Systems
  • Grid Computing
  • Volunteer Computing
  • A Brief History of Hadoop
  • Apache Hadoop and the Hadoop Ecosystem
  • Hadoop Releases



Writing Map-Reduce Applications

  • The Configuration API
  • Configuring the Development Environment
  • Running Locally on Test Data
  • Cluster Specs
  • Cluster Setup and Installation
  • Hadoop Configuration
  • YARN Configuration
  • Benchmarking a Hadoop Cluster
  • Hadoop in the Cloud
  • Tuning
  • MapReduce Workflows
  • Monitoring and debugging on a production cluster
  • Tuning for performance

Managing Hadoop

  • Setting up parameter values for practical use
  • Checking system’s health
  • Setting permissions
  • Managing quotas
  • Enabling trash
  • Removing DataNodes
  • Adding DataNodes
  • Managing NameNode and Secondary NameNode
  • Recovering from a failed NameNode
  • Designing network layout and rack awareness
  • Map-Reduce Features

Encarta Labs Advantage

  • One Stop Corporate Training Solution Providers for over 4,000 Modules on a variety of subjects
  • All courses are delivered by Industry Veterans
  • Get jumpstarted from newbie to production ready in a matter of few days
  • Trained more than 50,000 Corporate executives across the Globe
  • All our trainings are conducted in workshop mode with more focus on hands-on sessions

View our other course offerings by visiting http://encartalabs.com/course-catalogue-all.php

Contact us for delivering this course as a public/open-house workshop/online training for a group of 10+ candidates.