Data Engineering with Python Training Course and Workshop in Bangalore, Mysore, Chennai, Hyderabad, Pune, Mumbai, Delhi, Noida, Gurgaon, Kolkata

Data Engineering is a software engineering practice with focus on design, development, and the productionizing of data processing systems. It includes all the practical aspects of data acquisition, transfer, transformation, and storage on-prem or in the cloud.

This Data Engineering with Python training course provides skills to apply Python to the practical aspects of data engineering and introduces to the popular Python libraries used in the field, including NumPy, pandas, Matplotlib, scikit-learn, and Apache Spark.

By attending Data Engineering with Python workshop, delegates will learn:

Data engineering practice
High-octane introduction to Python
Technical reviews of NumPy, pandas, and other Python libraries and data processing systems
Data visualization and exploratory data analysis
Data repairing and normalization
Understanding the data needs and requirements of Machine Learning and Data Science projects
Python in the Cloud
Python on Hadoop (PySpark)

Practical experience coding in one or more modern programming languages. Knowledge of Python is desirable but not necessary.

Developers, Software Engineers, Data Scientists, and IT Architects

Data Engineering Defined

Data is King
Translating Data into Business Insights
What is Data Engineering
The Data-Related Roles
The Data Science Skill Sets
The Data Engineer Role
An Example of a Data Product
Data Schema for Data Exchange Interoperability
The Data Exchange Interoperability Options
Big Data and NoSQL
Data Physics
The Traditional Client - Server Processing Pattern
Data Locality (Distributed Computing Economics)
The CAP Theorem
Mechanisms to Guarantee a Single CAP Property
The CAP Triangle
Eventual Consistency

Data Processing Phases

Typical Data Processing Pipeline
Data Discovery Phase
Data Harvesting Phase
Data Priming Phase
Data Logistics and Data Governance
Exploratory Data Analysis
Model Planning Phase
Model Building Phase
Communicating the Results
Production Roll-out

Introduction to Python Programming

Imperative and Functional programming
Python core functionality
Integrated development environments
Jupyter notebooks

SciPy

SciPy ecosystem overview
Data engineering use cases

NumPy

Introduction to NumPy
NumPy's value proposition
N-dimensional arrays
Broadcasting
Linear algebra capabilities
Data indexing, slicing, and iterating

Pandas

Introduction to pandas
Pandas' data structures
Wrangling tabular data with pandas and NumPy
Merging, joining, and aggregating data
Dealing with categorical data
Time series
Visualization capabilities

Matplotlib

Exploratory data analysis
Data visualization with matplotlib

Core Data Engineering Tasks

Data acquisition in Python
Database and Web interfaces
Ensuring data quality
Repairing and normalizing data
Descriptive statistics computing features in Python
Processing data at scale

Python in the Cloud

AWS Lambdas
AWS Glue
AWS EMR

PySpark

Scalable Computing Needs
Introduction to Apache Spark
Running PySpark on Hadoop
Spark SQL
The DataFrame Structure

Encarta Labs Advantage

One Stop Corporate Training Solution Providers for over 6,000 various courses on a variety of subjects
All courses are delivered by Industry Veterans
Get jumpstarted from newbie to production ready in a matter of few days

Trained more than 50,000 Corporate executives across the Globe
All our trainings are conducted in workshop mode with more focus on hands-on sessions

Data Engineering with Python

COURSE AGENDA