Hadoop Administration – Comprehensive

( Duration: 2 Days )

Apache Hadoop is a framework that allows for the distributed processing of massive data sets across clusters of computers using a simple programming model. It is designed to scale up from single servers to thousands of machines, each offering local computation and storage. Hadoop has established itself as an industry-leading platform for deploying cloud-based applications and services. The Hadoop eco-system is large, and it includes such popular products as HDFS, Map/Reduce, HBase, Zookeeper, Oozie, Pig, and Hive. However, with such versatility comes complexity and difficulty in deciding on appropriate use cases.

Hadoop Administration training course presents all the small building blocks with a thorough coverage of each component in the Hadoop Administration stack. We begin by looking at Hadoop’s architecture and its underlying parts with topdown identification of component interactions within the Hadoop eco-system. This course then provides in-depth coverage of Hadoop Administration Distributed FileSystem (HDFS), HBase, Map/Reduce, Oozie, Pig and Hive. To re-enforce concepts, each section is followed by a set of hands-on exercises.

By attending Hadoop Administration workshop, Participants will learn:

  • Hadoop, HDFS and it's Ecosystem?
  • Understand Data Loading Techniques using Sqoop and Flume.
  • How to Plan, implement, manage, monitor, and secure a Hadoop Cluster.
  • How to configure backup options, diagnose and recover node failures in a Hadoop Cluster.
  • Have a good understanding of ZooKeeper service.
  • Secure a deployment and understand Backup and Recovery.
  • HBASE, Oozie, Hive, and Hue.

Hadoop Administration class assumes:

  • Good knowledge of Linux is required.
  • Fundamental Linux system administration skills such as Linux scripting (perl/bash), good troubleshooting skills, understanding of system’s capacity, bottlenecks, basics of memory, CPU, OS, storage, and networks are preferable.
  • No prior knowledge of Apache Hadoop and Hadoop Clusters is required.

Production support Database Administrators, Development Database Administrators, System Administrators, Software Architects, Data Warehouse Professionals, IT Managers, Software Developers and those interested in learning Hadoop Cluster Administration should attend this course.



Hadoop Cluster and Administration

  • Apache Hadoop
  • HDFS
  • Getting Data into HDFS
  • MapReduce
  • Hadoop Cluster

Hadoop Architecture and Ecosystem

  • Hadoop server roles and their usage
  • Rack Awareness
  • Anatomy of Write and Read
  • Replication Pipeline
  • Data Processing
  • Hadoop Installation and Initial Configuration
  • Deploying Hadoop in pseudo-distributed mode
  • Deploying a multi-node Hadoop cluster
  • Installing Hadoop Clients
  • Hive
  • Pig
  • Hue

Hadoop 2.0 and High Availability

  • Configuring Secondary NameNode
  • Hadoop 2.0
  • YARN framework
  • MRv2
  • Hadoop 2.0 Cluster setup
  • Deploying Hadoop 2.0 in pseudo-distributed mode
  • Deploying a multi-node Hadoop 2.0 cluster

Hadoop Cluster - Planning and Implementation

  • Planning the Hadoop Cluster
  • Cluster Size
  • Hardware and Software considerations
  • Managing and Scheduling Jobs
  • Types of schedulers in Hadoop
  • Configuring the schedulers and run MapReduce jobs
  • Cluster Monitoring
  • Troubleshooting

Hadoop Cluster management

  • Cloudera Hadoop Manager

Backup, Recovery and Cluster Maintenance

  • Configure Rack awareness
  • setting up Hadoop Backup
  • data nodes in a cluster
  • setup quota’s
  • upgrade Hadoop cluster
  • copy data across clusters using distcp
  • Diagnostics and Recovery
  • Cluster Maintenance

HDFS Federation and Security

  • HDFS Federation
  • Service Monitoring
  • Service and Log Management
  • Auditing and Alerts
  • Service Monitoring
  • Basics of Hadoop Platform Security
  • Securing the Platform
  • Kerberos

Oozie, Hive and HBase

  • Oozie, Hive Administration
  • HBase
  • Advanced HBASE
  • HBase and Hive Integration

Project - Hadoop Administration

  • Understanding the Problem
  • Plan
  • Design
  • Create a Hadoop Cluster
  • Setup and Configure commonly used Hadoop ecosystem components such as Pig and Hive
  • Configure Ganglia/kimbana on the Hadoop cluster and troubleshoot the common Cluster Problems

Encarta Labs Advantage

  • One Stop Corporate Training Solution Providers for over 3,500 Modules on a variety of subjects
  • All courses are delivered by Industry Veterans
  • Get jumpstarted from newbie to production ready in a matter of few days
  • Trained more than 20,000 corporate candidates across india and abroad
  • All our trainings are conducted in workshop mode with more focus on hands On

View our other course offerings by visiting www.encartalabs.com/course-catalogue

Contact us for delivering this course as a public/open-house workshop for a group of 10+ candidates at our venue