Apache Spark is a powerful, open-source processing engine for data in the Hadoop cluster, optimized for speed, ease of use, and sophisticated analytics. The Spark framework supports streaming data processing and complex iterative algorithms, enabling applications to run up to 100x faster than traditional Hadoop MapReduce programs. With Spark, you can write sophisticated applications to execute faster decisions and real-time actions to a wide variety of use cases, architectures, and industries.
This Apache Spark for Data Scientists training course explores using Spark for common data related activities from a data science perspective. You will learn to build unified big data applications combining batch, streaming, and interactive analytics on your data.
By attending Apache Spark for Data Scientists workshop, delegates will:
- The essentials of Spark architecture and applications
- How to execute Spark Programs
- How to create and manipulate both RDDs (Resilient Distributed Datasets) and UDFs (Unified Data Frames)
- How to integrate machine learning into Spark applications
- How to use Spark Streaming
- Knowledge of Java Programming
- Knowledge of SQL (familiarity wits SQL basics)
- Basic knowledge of Statistics and Probability
- Data Science background