Call : (+91) 968636 4243
Mail : info@EncartaLabs.com
EncartaLabs

Data Engineering with Python

( Duration: 5 Days )

Data Engineering is a software engineering practice with focus on design, development, and the productionizing of data processing systems. It includes all the practical aspects of data acquisition, transfer, transformation, and storage on-prem or in the cloud.

This Data Engineering with Python training course provides skills to apply Python to the practical aspects of data engineering and introduces to the popular Python libraries used in the field, including NumPy, pandas, Matplotlib, scikit-learn, and Apache Spark.

By attending Data Engineering with Python workshop, delegates will learn:

  • Data engineering practice
  • High-octane introduction to Python
  • Technical reviews of NumPy, pandas, and other Python libraries and data processing systems
  • Data visualization and exploratory data analysis
  • Data repairing and normalization
  • Understanding the data needs and requirements of Machine Learning and Data Science projects
  • Python in the Cloud
  • Python on Hadoop (PySpark)

  • Practical experience coding in one or more modern programming languages. Knowledge of Python is desirable but not necessary.
  • Developers, Software Engineers, Data Scientists, and IT Architects

COURSE AGENDA

1

Data Engineering Defined

  • Data is King
  • Translating Data into Business Insights
  • What is Data Engineering
  • The Data-Related Roles
  • The Data Science Skill Sets
  • The Data Engineer Role
  • An Example of a Data Product
  • Data Schema for Data Exchange Interoperability
  • The Data Exchange Interoperability Options
  • Big Data and NoSQL
  • Data Physics
  • The Traditional Client - Server Processing Pattern
  • Data Locality (Distributed Computing Economics)
  • The CAP Theorem
  • Mechanisms to Guarantee a Single CAP Property
  • The CAP Triangle
  • Eventual Consistency
2

Data Processing Phases

  • Typical Data Processing Pipeline
  • Data Discovery Phase
  • Data Harvesting Phase
  • Data Priming Phase
  • Data Logistics and Data Governance
  • Exploratory Data Analysis
  • Model Planning Phase
  • Model Building Phase
  • Communicating the Results
  • Production Roll-out
3

Introduction to Python Programming

  • Imperative and Functional programming
  • Python core functionality
  • Integrated development environments
  • Jupyter notebooks
4

SciPy

  • SciPy ecosystem overview
  • Data engineering use cases
5

NumPy

  • Introduction to NumPy
  • NumPy's value proposition
  • N-dimensional arrays
  • Broadcasting
  • Linear algebra capabilities
  • Data indexing, slicing, and iterating
6

Pandas

  • Introduction to pandas
  • Pandas' data structures
  • Wrangling tabular data with pandas and NumPy
  • Merging, joining, and aggregating data
  • Dealing with categorical data
  • Time series
  • Visualization capabilities
7

Matplotlib

  • Exploratory data analysis
  • Data visualization with matplotlib
8

Core Data Engineering Tasks

  • Data acquisition in Python
  • Database and Web interfaces
  • Ensuring data quality
  • Repairing and normalizing data
  • Descriptive statistics computing features in Python
  • Processing data at scale
9

Python in the Cloud

  • AWS Lambdas
  • AWS Glue
  • AWS EMR
10

PySpark

  • Scalable Computing Needs
  • Introduction to Apache Spark
  • Running PySpark on Hadoop
  • Spark SQL
  • The DataFrame Structure

Encarta Labs Advantage

  • One Stop Corporate Training Solution Providers for over 6,000 various courses on a variety of subjects
  • All courses are delivered by Industry Veterans
  • Get jumpstarted from newbie to production ready in a matter of few days
  • Trained more than 50,000 Corporate executives across the Globe
  • All our trainings are conducted in workshop mode with more focus on hands-on sessions

View our other course offerings by visiting https://www.encartalabs.com/course-catalogue-all.php

Contact us for delivering this course as a public/open-house workshop/online training for a group of 10+ candidates.

Top
Notice
X