EncartaLabs

Data Engineering

( Duration: 5 Days )

For over 10 years, there has been an intense focus by companies to extract business value from their data. Out of this activity, a role called the data scientist emerged. However, it quickly became obvious that a majority of a data scientist’s time was spent on data preparation or moving analytical models into production environments. Thus, the data engineer has emerged as a highly desirable and indispensable member of an analytics project team. This Data Engineering training course covers the content and hands-on lab exercises provided in the following courses: Data Warehousing with SQL and NoSQL ETL Offload with Hadoop and Spark Data Governance, Security and Privacy for Big Data Processing Streaming and IoT Data Building Data Pipelines with Python

  • Experience with a programming language such as Java, R, or Python
  • Familiarity with the non-statistical aspects of the Data Science and Big Data Analytics v2 content
  • Understanding of the data engineer role provided in the Introduction to Data Engineering course

The Data Engineering workshop is ideal for:

  • Data engineers, data scientists, data architects, data analysts or anyone else who wants to learn and apply data engineering principles and tools.

COURSE AGENDA

1

Data Warehousing with SQL and NoSQL

  • Data warehouses
  • Relational databases
    • SQL operations
    • Transactional vs. analytical
    • Design and performance considerations
  • NoSQL
    • SQL vs. NoSQL
    • NoSQL database types and examples
  • Redis
  • Apache Cassandra
  • Apache CouchDB
  • Data Lakes
2

ETL Offload with Hadoop and Spark

  • Hadoop ecosystem
  • Hadoop Distributed File System (HDFS)
  • Data ingestion tools
    • Apache Flume
    • Apache Sqoop
  • Apache Spark
  • ETL schedulers
    • Apache Oozie
    • Apache Airflow
  • ETL offload implementation considerations
3

Data Governance, Security and Privacy for Big Data

  • Data Governance Overview
  • Data Governance Roles
  • Data Governance Models
  • Metadata
  • Master Data Management
  • Security Controls in Hadoop Ecosystem
  • Apache Atlas
  • Apache Ranger
  • Apache Knox
  • Security Considerations in the Cloud
  • General Data Protection Regulation (GDPR)
  • Data Ethics: Avoiding Hidden Biases
4

Processing Streaming and IoT Data

  • Processing Streaming and IoT Data Overview
  • Streaming and IoT Data Processing Tools Framework
  • Apache Storm
  • Apache Kafka
  • Apache Spark Streaming
  • Apache Flink
  • Pravega
  • Project Nautilus
  • EdgeX Foundry
5

Building Data Pipelines with Python

  • Introduction to data pipelines
  • Introduction to Python
    • Features of Python
    • Basic Syntax of Python
    • Data Types, Operators, and Conditional Statements
    • User-defined Functions and Classes
  • Python libraries
  • Data structures in Python
  • Data pipeline best practices

Encarta Labs Advantage

  • One Stop Corporate Training Solution Providers for over 4,000 Modules on a variety of subjects
  • All courses are delivered by Industry Veterans
  • Get jumpstarted from newbie to production ready in a matter of few days
  • Trained more than 50,000 Corporate executives across the Globe
  • All our trainings are conducted in workshop mode with more focus on hands-on sessions

View our other course offerings by visiting http://encartalabs.com/course-catalogue-all.php

Contact us for delivering this course as a public/open-house workshop/online training for a group of 10+ candidates.

Top