EncartaLabs

Apache Beam

( Duration: 4 Days )

Apache Beam is an open source, unified programming model for defining and executing parallel data processing pipelines. It's power lies in its ability to run both batch and streaming pipelines, with execution being carried out by one of Beam's supported distributed processing back-ends: Apache Apex, Apache Flink, Apache Spark, and Google Cloud Dataflow. Apache Beam is useful for ETL (Extract, Transform, and Load) tasks such as moving data between different storage media and data sources, transforming data into a more desirable format, and loading data onto a new system.

In Apache Beam training course, delegates will learn to implement the Apache Beam SDKs in a Java or Python application that defines a data processing pipeline for decomposing a big data set into smaller chunks for independent, parallel processing.

By attending Apache Beam workshop, delegates will learn to:

  • Install and configure Apache Beam.
  • Use a single programming model to carry out both batch and stream processing from withing their Java or Python application.
  • Execute pipelines across multiple environments.

  • Experience with Python Programming.
  • Experience with the Linux command line.

The Apache Beam class is ideal for:

  • Developers

COURSE AGENDA

1

Introduction

  • Apache Beam vs MapReduce, Spark Streaming, Kafka Streaming, Storm and Flink
2

Installing and Configuring Apache Beam

3

Overview of Apache Beam Features and Architecture

  • Beam Model, SDKs, Beam Pipeline Runners
  • Distributed processing back-ends
4

Understanding the Apache Beam Programming Model

  • How a pipeline is executed
5

Running a sample pipeline

  • Preparing a WordCount pipeline
  • Executing the Pipeline locally
6

Designing a Pipeline

  • Planning the structure, choosing the transforms, and determining the input and output methods
7

Creating the Pipeline

  • Writing the driver program and defining the pipeline
  • Using Apache Beam classes
  • Data sets, transforms, I/O, data encoding, etc.
8

Executing the Pipeline

  • Executing the pipeline locally, on remote machines, and on a public cloud
  • Choosing a runner
  • Runner-specific configurations
9

Testing and Debugging Apache Beam

  • Using type hints to emulate static typing
  • Managing Python Pipeline Dependencies
10

Processing Bounded and Unbounded Datasets

  • Windowing and Triggers
11

Making Your Pipelines Reusable and Maintainable

12

Create New Data Sources and Sinks

  • Apache Beam Source and Sink API
13

Integrating Apache Beam with other Big Data Systems

  • Apache Hadoop, Apache Spark, Apache Kafka

Encarta Labs Advantage

  • One Stop Corporate Training Solution Providers for over 4,000 Modules on a variety of subjects
  • All courses are delivered by Industry Veterans
  • Get jumpstarted from newbie to production ready in a matter of few days
  • Trained more than 50,000 Corporate executives across the Globe
  • All our trainings are conducted in workshop mode with more focus on hands-on sessions

View our other course offerings by visiting http://encartalabs.com/course-catalogue-all.php

Contact us for delivering this course as a public/open-house workshop/online training for a group of 10+ candidates.

Top