EncartaLabs

Hadoop Administration

The Hadoop Administration - Essentials training course meant for administrators, provides with the fundamentals required to successfully implement and maintain Hadoop clusters. This course consists of an effective mix of interactive lecture and of hands-on lab exercises.

Apache Hadoop is a framework that allows for the distributed processing of massive data sets across clusters of computers using a simple programming model. It is designed to scale up from single servers to thousands of machines, each offering local computation and storage. Hadoop has established itself as an industry-leading platform for deploying cloud-based applications and services. The Hadoop eco-system is large, and it includes such popular products as HDFS, Map/Reduce, HBase, Zookeeper, Oozie, Pig, and Hive. However, with such versatility comes complexity and difficulty in deciding on appropriate use cases.

Hadoop Administration - Comprehensive training course presents all the small building blocks with a thorough coverage of each component in the Hadoop Administration stack. We begin by looking at Hadoop’s architecture and its underlying parts with topdown identification of component interactions within the Hadoop eco-system. This course then provides in-depth coverage of Hadoop Administration Distributed FileSystem (HDFS), HBase, Map/Reduce, Oozie, Pig and Hive.

In Hadoop Administration - Essentials workshop, delegates will learn to:
  • Utilize best practices for deploying Hadoop clusters
  • Determine hardware needs
  • Monitor Hadoop clusters
  • Recover from NameNode failure
  • Handle DataNode failures
  • Manage hardware upgrade processes including node removal, configuration changes, node installation and rebalancing clusters
  • Manage log files
  • Install, configure, deploy verify and maintain Hadoop clusters including:
In Hadoop Administration - Comprehensive workshop, delegates will learn:
  • Hadoop, HDFS and it's Ecosystem?
  • Understand Data Loading Techniques using Sqoop and Flume.
  • How to Plan, implement, manage, monitor, and secure a Hadoop Cluster.
  • How to configure backup options, diagnose and recover node failures in a Hadoop Cluster.
  • Have a good understanding of ZooKeeper service.
  • Secure a deployment and understand Backup and Recovery.
  • HBASE, Oozie, Hive, and Hue.

Hadoop Administration - Essentials
  • Basic level of Linux system administration experience
  • Prior knowledge of Apache Hadoop is not required
Hadoop Administration - Comprehensive
  • Good knowledge of Linux is required.
  • Fundamental Linux system administration skills such as Linux scripting (perl/bash), good troubleshooting skills, understanding of system’s capacity, bottlenecks, basics of memory, CPU, OS, storage, and networks are preferable.
  • No prior knowledge of Apache Hadoop and Hadoop Clusters is required.

Administrators who are interested in learning how to deploy and manage a Hadoop cluster.

Production support Database Administrators, Development Database Administrators, System Administrators, Software Architects, Data Warehouse Professionals, IT Managers, Software Developers and anyone else interested in learning Hadoop Cluster Administration should attend this course.

COURSE AGENDA

Hadoop Administration - Essentials
(Duration : 2 Days)

1

Overview of Hadoop

2

Cluster Hardware and Installation of HDFS and MapReduce

3

Rack Topology

4

Setting up a Multi-user Environment

5

Using Schedulers

6

Hadoop Security with Kerberos

7

Logs and Log Rotation

8

Monitor, Maintain and Troubleshoot HDFS and MapReduce

9

NameNode Failure and Recovery

10

JobTracker Restarting

11

Upgrade of Hardware Process

12

Rebalancing

13

Data Management

14

Install Configure, Deploy and Verify Pig

15

Install Configure, Deploy and Verify Hive

16

Install Configure, Deploy and Verify MySQL

17

Install Configure, Deploy and Verify HBase and ZooKeeper

18

Install Configure, Deploy and Verify Other Hadoop Ecosystem (HCatalog, Oozie, Mahout)

19

Install Configure, Deploy and Verify Nagios and Ganglia


Hadoop Administration - Comprehensive
(Duration : 2 Days)

1

Hadoop Cluster and Administration

  • Apache Hadoop
  • HDFS
  • Getting Data into HDFS
  • MapReduce
  • Hadoop Cluster
2

Hadoop Architecture and Ecosystem

  • Hadoop server roles and their usage
  • Rack Awareness
  • Anatomy of Write and Read
  • Replication Pipeline
  • Data Processing
  • Hadoop Installation and Initial Configuration
  • Deploying Hadoop in pseudo-distributed mode
  • Deploying a multi-node Hadoop cluster
  • Installing Hadoop Clients
  • Hive
  • Pig
  • Hue
3

Hadoop 2.0 and High Availability

  • Configuring Secondary NameNode
  • Hadoop 2.0
  • YARN framework
  • MRv2
  • Hadoop 2.0 Cluster setup
  • Deploying Hadoop 2.0 in pseudo-distributed mode
  • Deploying a multi-node Hadoop 2.0 cluster
4

Hadoop Cluster - Planning and Implementation

  • Planning the Hadoop Cluster
  • Cluster Size
  • Hardware and Software considerations
  • Managing and Scheduling Jobs
  • Types of schedulers in Hadoop
  • Configuring the schedulers and run MapReduce jobs
  • Cluster Monitoring
  • Troubleshooting
5

Hadoop Cluster management

  • Cloudera Hadoop Manager
6

Backup, Recovery and Cluster Maintenance

  • Configure Rack awareness
  • setting up Hadoop Backup
  • data nodes in a cluster
  • setup quota’s
  • upgrade Hadoop cluster
  • copy data across clusters using distcp
  • Diagnostics and Recovery
  • Cluster Maintenance
7

HDFS Federation and Security

  • HDFS Federation
  • Service Monitoring
  • Service and Log Management
  • Auditing and Alerts
  • Service Monitoring
  • Basics of Hadoop Platform Security
  • Securing the Platform
  • Kerberos
8

Oozie, Hive and HBase

  • Oozie, Hive Administration
  • HBase
  • Advanced HBASE
  • HBase and Hive Integration
9

Project - Hadoop Administration

  • Understanding the Problem
  • Plan
  • Design
  • Create a Hadoop Cluster
  • Setup and Configure commonly used Hadoop ecosystem components such as Pig and Hive
  • Configure Ganglia/kimbana on the Hadoop cluster and troubleshoot the common Cluster Problems

Encarta Labs Advantage

  • One Stop Corporate Training Solution Providers for over 3,500 Modules on a variety of subjects
  • All courses are delivered by Industry Veterans
  • Get jumpstarted from newbie to production ready in a matter of few days
  • Trained more than 20,000 corporate candidates across india and abroad
  • All our trainings are conducted in workshop mode with more focus on hands On

View our other course offerings by visiting www.encartalabs.com/course-catalogue

Contact us for delivering this course as a public/open-house workshop for a group of 10+ candidates at our venue

Top