Apache Hadoop is a framework that allows for the distributed processing of massive data sets across clusters of computers using a simple programming model. It is designed to scale up from single servers to thousands of machines, each offering local computation and storage. Hadoop has established itself as an industry-leading platform for deploying cloud-based applications and services. The Hadoop eco-system is large, and it includes such popular products as HDFS, Map/Reduce, HBase, Zookeeper, Oozie, Pig, and Hive. However, with such versatility comes complexity and difficulty in deciding on appropriate use cases.
The Hadoop Administration - Essentials training course provides the fundamentals required to successfully implement and maintain Hadoop clusters. This course consists of an effective mix of interactive lecture and of hands-on lab exercises.
The Hadoop Administration - Comprehensive training course presents all the small building blocks with a thorough coverage of each component in the Hadoop Administration stack. We begin by looking at Hadoop’s architecture and its underlying parts with topdown identification of component interactions within the Hadoop eco-system. This course then provides in-depth coverage of Hadoop Administration Distributed FileSystem (HDFS), HBase, Map/Reduce, Oozie, Pig and Hive.
- Utilize best practices for deploying Hadoop clusters
- Determine hardware needs
- Monitor Hadoop clusters
- Recover from NameNode failure
- Handle DataNode failures
- Manage hardware upgrade processes including node removal, configuration changes, node installation and rebalancing clusters
- Manage log files
- Install, configure, deploy verify and maintain Hadoop clusters including:
- MapReduce
- HDFS
- Pig
- Hive (and MySQL)
- HBase (and ZooKeeper)
- HCatalog
- Oozie
- Mahout
- Hadoop, HDFS and it's Ecosystem?
- Understand Data Loading Techniques using Sqoop and Flume.
- How to Plan, implement, manage, monitor, and secure a Hadoop Cluster.
- How to configure backup options, diagnose and recover node failures in a Hadoop Cluster.
- Have a good understanding of ZooKeeper service.
- Secure a deployment and understand Backup and Recovery.
- HBASE, Oozie, Hive, and Hue.
- Basic level of Linux system administration experience
- Prior knowledge of Apache Hadoop is not required
- Good knowledge of Linux is required.
- Fundamental Linux system administration skills such as Linux scripting (perl/bash), good troubleshooting skills, understanding of system’s capacity, bottlenecks, basics of memory, CPU, OS, storage, and networks are preferable.
- No prior knowledge of Apache Hadoop and Hadoop Clusters is required.
Administrators who are interested in learning how to deploy and manage a Hadoop cluster.
Production support Database Administrators, Development Database Administrators, System Administrators, Software Architects, Data Warehouse Professionals, IT Managers, Software Developers and anyone else interested in learning Hadoop Cluster Administration should attend this course.