EncartaLabs

Hadoop Development

Hadoop Development - Essentials training course gives awareness about the Hadoop framework which is the de facto platform for Big Data computation. Apache Hadoop is an open-source software framework that supports data-intensive distributed applications, licensed under the Apache v2 license. It supports the running of applications on large clusters of commodity hardware. The Hadoop framework transparently provides applications with both reliability and data motion. Hadoop implements a computational paradigm named map/reduce, where the application is divided into many small fragments of work, each of which may be executed or re-executed on any node in the cluster. In addition, it provides a distributed file system that stores data on the computer nodes, providing very high aggregate bandwidth across the cluster.

Hadoop Development - Comprehensive training course covers concepts addressed on the Cloudera Certified Developer for Apache Hadoop (CCDH) exam. Participants will learn to build powerful data processing applications in this course. They will learn about MapReduce, the Hadoop Distributed Files System (HDFS), and how to write MapReduce code, and also learn best practices for Hadoop development, debugging, and implementation of workflows.

Throughout this course, Participants will write Hadoop code and perform other hands-on exercises to solidify Participants understanding of the concepts.

In Hadoop Development - Essentials workshop, delegates will learn to:
  • Use the Hadoop & HDFS platform
  • Loading data into HDFS
  • Introduction to MapReduce
  • Writing and debugging MapReduce jobs
  • Implementing common algorithms on Hadoop
  • Using Mahout for advanced data mining
  • Benchmarking and optimizing performance
In Hadoop Development - Comprehensive workshop, delegates will learn:
  • MapReduce and the HDFS
  • Write MapReduce code in Java or other programming languages
  • Issues to consider when developing MapReduce jobs
  • Implement common algorithms in Hadoop
  • Best practices for Hadoop development and debugging
  • Use other projects such as Apache Hive, Apache Pig, Sqoop, and Oozie
  • Advanced Hadoop API topics required for real-world data analysis

  • Some programming experience (preferably Java)
  • Knowledge of Hadoop is not required

  • Project / Program / Technical managers
  • Technical / Team leads
  • Software analysts/ engineers
  • Pre-sales consultant
  • Business development managers

COURSE AGENDA

Hadoop Development - Essentials
(Duration : 2 Days)

1

Hadoop and MapReduce: An Overview

  • Big Data and the questions
  • Hadoop and the answers
  • Hadoop Cluster Configuration
2

Hadoop Internals and MapReduce Design Patterns

  • Hadoop framework Internals
  • MapReduce Internals
  • MapReduce Design Patterns and Use-Cases
3

Hadoop sub-projects

  • Hive
  • Pig
  • HBase
  • Impala
4

Hadoop in Production

  • Best practices for Hadoop cluster
  • Best Practices for MapReduce
  • Hadoop in the cloud
  • Big Data and Social Media

Hadoop Development - Comprehensive
(Duration : 5 Days)

1

Hadoop Introduction

  • What is Big Data?
  • Source of Data
  • Characteristics of Big Data
  • Benefits of Big Data analysis
  • Challenges of Big Data processing
  • Why Hadoop for Big Data?
  • An introduction to Hadoop
  • What is Hadoop not good for?
  • Hadoop Ecosystem
2

Hadoop Installation

  • Pre-requisite
  • Hadoop Installation
  • Checking Installation
3

MapReduce Framework

  • What is MapReduce?
  • How does MapReduce work?
  • MapReduce Program
  • MapReduce program execution
  • MapReduce program Unit Testing
  • Deploying MapReduce on a cluster
  • Hadoop streaming
  • Combiner
  • Partitioner
  • Counters
4

HDFS - Hadoop Distributed File System

  • What is HDFS?
  • HDFS Architecture
  • Data Flow – anatomy of File Read and File Write
  • What is HDFS Block?
  • Types of Nodes in HDFS
  • What is HDFS Federation?
  • HDFS High Availability
  • HDFS Commands – also Parallel Copy
  • Hadoop Archives
5

Hive

  • What is Hive
  • Hive Architecture
  • Hive Language
  • What is Hive Metastore?
  • HiveQL
  • Hive Tables
  • How to Query Hive Tables?
  • User-Defined Functions
6

Pig

  • What is Pig
  • Pig Architecture
  • Execution Types
  • Pig Latin
  • User-Defined Functions
  • Data Processing Operators
7

HBase

  • HBase Introduction
  • HBase Architecture
  • HBase Data Model
  • HBase Schema Design
  • HBase and MapReduce
  • HBase Configuration
  • HBase Performance
  • HBase Troubleshooting & Debugging
8

Introduction to Sqoop

Encarta Labs Advantage

  • One Stop Corporate Training Solution Providers for over 3,500 Modules on a variety of subjects
  • All courses are delivered by Industry Veterans
  • Get jumpstarted from newbie to production ready in a matter of few days
  • Trained more than 20,000 corporate candidates across india and abroad
  • All our trainings are conducted in workshop mode with more focus on hands On

View our other course offerings by visiting www.encartalabs.com/course-catalogue

Contact us for delivering this course as a public/open-house workshop for a group of 10+ candidates at our venue

Top