EncartaLabs

IBM Open Platform with Apache Hadoop

( Duration: 2 Days )

The IBM Open Platform with Apache Hadoop training course provides an in-depth introduction to the main components of the ODP core --namely Apache Hadoop (inclusive of HDFS, YARN, and MapReduce) and Apache Ambari - as well as providing a treatment of the main open-source components that are generally made available with the ODP core in a production Hadoop cluster.

By attending IBM Open Platform with Apache Hadoop workshop, attendees will learn to:

    • List and describe the major components of the open-source Apache Hadoop stack and the approach taken by the Open Data Foundation.
    • Manage and monitor Hadoop clusters with Apache Ambari and related components
    • Explore the Hadoop Distributed File System (HDFS) by running Hadoop commands.
    • Understand the differences between Hadoop 1 (with MapReduce 1) and Hadoop 2 (with YARN and MapReduce 2).
    • Create and run basic MapReduce jobs using command line.
    • Explain how Spark integrates into the Hadoop ecosystem.
    • Execute iterative algorithms using Spark's RDD.
    • Explain the role of coordination, management, and governance in the Hadoop ecosystem using Apache Zookeeper, Apache Slider, and Apache Knox.
    • Explore common methods for performing data movement
      • Configure Flume for data loading of log files
      • Move data into the HDFS from relational databases using Sqoop
    • Understand when to use various data storage formats (flat files, CSV/delimited, Avro/Sequence files, Parquet, etc.).
    • Review the differences between the available open-source programming languages typically used with Hadoop (Pig, Hive) and for Data Science (Python, R)
    • Query data from Hive.
    • Perform random access on data stored in HBase.
    • Explore advanced concepts, including Oozie and Solr

  • Knowledge of Linux would be beneficial.

The IBM Open Platform with Apache Hadoop workshop is for those who want a foundation of IBM BigInsights. This includes: Big data engineers, data scientist, developers or programmers, administrators who are interested in learning about IBM's Open Platform with Apache Hadoop.

COURSE AGENDA

1

IBM Open Platform with Apache Hadoop

2

Apache Ambari

3

Hadoop Distributed File System

4

MapReduce and Yarn

  • Introduction to MapReduce based on MR1
  • Limitations of MR1
  • YARN and MR2
5

Apache Spark

6

Coordination, management, and governance

7

Data Movement

8

Storing and Accessing Data

  • Representing Data: CSV, XML, JSON, and YAML
  • Open Source Programming Languages: Pig, Hive, and Other [R, Python, etc]
  • NoSQL Concepts
  • Accessing Hadoop data using Hive
  • Querying Hadoop data using Hive
9

Advanced Topics

  • Controlling job workflows with Oozie
  • Search using Apache Solr

Encarta Labs Advantage

  • One Stop Corporate Training Solution Providers for over 4,000 Modules on a variety of subjects
  • All courses are delivered by Industry Veterans
  • Get jumpstarted from newbie to production ready in a matter of few days
  • Trained more than 50,000 Corporate executives across the Globe
  • All our trainings are conducted in workshop mode with more focus on hands-on sessions

View our other course offerings by visiting http://encartalabs.com/course-catalogue-all.php

Contact us for delivering this course as a public/open-house workshop/online training for a group of 10+ candidates.

Top