EncartaLabs

Apache Spark with Scala for Big Data Solutions

( Duration: 4 Days )

In Apache Spark with Scala for Big Data Solutions training course, you will learn to leverage Spark best practices, develop solutions that run on the Apache Spark platform, and take advantage of Spark's efficient use of memory and powerful programming model. Learn to supercharge your data with Apache Spark, a big data platform well-suited for iterative algorithms required by graph analytics and machine learning.

By attending Apache Spark with Scala for Big Data Solutions workshop, attendees will learn to:

  • Develop applications with Spark
  • Work with the libraries for SQL, Streaming, and Machine Learning
  • Map real-world problems to parallel algorithms
  • Build business applications that integrate with Spark

  • A minimum of 6 months Professional programming experience Java or C#

COURSE AGENDA

1

Introduction to Spark

  • Defining Big Data and Big Computation
  • What is Spark?
  • What are the benefits of Spark?
2

Scaling-out applications

  • Identifying the performance limitations of a modern CPU
  • Scaling traditional parallel processing models
3

Designing parallel algorithms

  • Fostering parallelism through functional programming
  • Mapping real-world problems to effective parallel algorithms
4

Parallelizing data structures

  • Partitioning data across the cluster using Resilient Distributed Datasets (RDD) and DataFrames
  • Apportioning task execution across multiple nodes
  • Running applications with the Spark execution model
5

The anatomy of a Spark cluster

  • Creating resilient and fault-tolerant clusters
  • Achieving scalable distributed storage
6

Managing the cluster

  • Monitoring and administering Spark applications
  • Visualizing execution plans and results
7

Selecting the development environment

  • Performing exploratory programming via the Spark shell
  • Building stand-alone Spark applications
8

Working with the Spark APIs

  • Programming with Scala and other supported languages
  • Building applications with the core APIs
  • Enriching applications with the bundled libraries
9

Querying structured data

  • Processing queries with DataFrames and embedded SQL
  • Extending SQL with User-Defined Functions (UDFs)
  • Exploiting Parquet and JSON formatted data sets
10

Integrating with external systems

  • Connecting to databases with JDBC
  • Executing Hive queries in external applications
11

What is streaming?

  • Implementing sliding window operations
  • Determining state from continuous data
  • Processing simultaneous streams
  • Improving performance and reliability
12

Streaming data sources

  • Streaming from built-in sources (e.g., log files, Twitter sockets, Kinesis, Kafka)
  • Developing custom receivers
  • Processing with the streaming API and Spark SQL
13

Classifying observations

  • Predicting outcomes with supervised learning
  • Building a decision tree classifier
14

Identifying patterns

  • Grouping data using unsupervised learning
  • Clustering with the k-means method

Encarta Labs Advantage

  • One Stop Corporate Training Solution Providers for over 4,000 Modules on a variety of subjects
  • All courses are delivered by Industry Veterans
  • Get jumpstarted from newbie to production ready in a matter of few days
  • Trained more than 50,000 Corporate executives across the Globe
  • All our trainings are conducted in workshop mode with more focus on hands-on sessions

View our other course offerings by visiting http://encartalabs.com/course-catalogue-all.php

Contact us for delivering this course as a public/open-house workshop/online training for a group of 10+ candidates.

Top