EncartaLabs

Data Science Essentials

( Duration: 5 Days )

Data Science is an applied study of data for statistical analysis and problem solving. This Data Science Essentials training course covers the data science pipeline needed by the everyday data scientist: data wrangling, analysis, machine learning, and communication and visualization.

By attending Data Science Essentials workshop, attendees will learn:

  • Data Science Overview
  • Data Gathering
  • Data Filtering
  • Data Transformation
  • Data Exploration
  • Data Integration
  • Data Analysis Concepts
  • Data Classification and Machine Learning
  • Data Communication and Visualization

  • Individuals with some programming and math experience working toward implementing data science in their everyday work.

COURSE AGENDA

1

Data Science Overview

2

Defining Data Science

  • What is Data Science?
  • What is Data Wrangling?
  • What is Big Data?
  • What is Machine Learning?
3

Implementing Data Science

  • Data Science Terminology
  • Data Communication
  • Data Science Pipeline
  • Data Science Tools
4

Data Gathering

5

Data Extraction

  • Basic Data Gathering
  • Gathering Web Data
  • Extracting Spreadsheet Data with in2csv
  • Extracting Spreadsheet Data with Agate
  • Extracting Legacy Data from dBASE Tables
  • Extracting HTML Data
6

Metadata

  • Gathering Metadata
  • Working with HTTP Headers
  • Working with Linux Log Files
  • Working with Email Headers
7

Remote Data

  • Connecting to Remote Data
  • Copying Remote Data
  • Synchronizing Remote Data
8

Data Filtering

9

Introduction to Data Filtering

  • Data Filtering Techniques and Tools
  • Processing Date Formats
  • Filtering HTTP Headers
  • Filtering CSV Data
  • Replacing Values with sed
  • Dropping Duplicate Data
  • Working with JPEG Headers
  • Filtering PDF Files
  • Filtering for Invalid Data
  • Parsing robots.txt
10

Data Transformation

11

File Format Conversions

  • Converting CSV to JSON
  • Converting XML to JSON
  • Converting CSV to SQL
  • Converting SQL to CSV
  • Changing CSV Delimiters
12

Data Conversions

  • Converting Dates
  • Converting Numbers
  • Rounding Numbers
13

Optical Character Recognition

  • OCR JPEG Images
  • Extracting Text from PDF Files
14

Data Exploration

15

Introduction to Data Exploration

  • Exploring CSV Data
  • Exploring CSV Statistics
  • Querying CSV Data
  • Plotting from the Command Line
  • Counting Words
  • Exploring Directory Trees
  • Determining Word Frequencies
  • Taking Random Samples
  • Finding the Top Rows
  • Finding Repeated Records
  • Identifying Outliers in Data
16

Data Integration

17

Introduction to Data Integration

  • Joining CSV Data
  • Concatenating Log Files
  • Sorting Text Files
  • Merging XML Data
  • Aggregating Data
  • Normalizing Data
  • Denormalizing Data
  • Pivoting Data Tables
  • Homogenizing Rows
18

Data Analysis Concepts

19

Data Science Math

  • Basic Data Science Math
  • Linear Algebra Vector Math
  • Linear Algebra Matrix Math
  • Linear Algebra Matrix Decomposition
20

Data Analysis Concepts

  • Data Formation
  • Introduction to Probability
  • Working with Events
  • Working with Probability
  • Continuous Probability Distributions
  • Discrete Probability Distributions
  • Introduction to Bayes Theorem
21

Estimates and Measures

  • Sampling Data
  • Statistical Measures
  • Estimators
  • Sampling Distributions
  • Confidence Intervals
  • Hypothesis Tests
  • Chi-Square
22

Data Classification and Machine Learning

23

Machine Learning Introduction

  • Introduction to Supervised Learning
  • Introduction to Unsupervised Learning
  • Understanding Linear Regression
  • Working with Predictors
24

Regression and Classification

  • Understanding Logistic Regression
  • Understanding Dummy Variables
  • Using Naïve Bayes Classification
  • Working with Decision Trees
25

Clustering

  • K-means Clustering
  • Using Cluster Validation
  • Using Principle Component Analysis
26

Errors and Validation

  • Introduction to Errors
  • Defining Underfitting
  • Defining Overfitting
  • Using K-folds Cross Validation
  • Using Neural Networks
  • Support Vector Machines (SVM)
27

Data Communication and Visualization

28

Introduction to Data Communication

  • Effective Communication and Visualization
  • Correlation Versus Causation
  • Simpson’s Paradox
  • Presenting Data
  • Documenting Data Science
  • Visual Data Exploration
29

Plotting

  • Creating Scatter Plots
  • Plotting Line Graphs
  • Creating Bar Charts
  • Creating Histograms
  • Creating Box Plots
  • Creating Network Visualizations
  • Creating a Bubble Plot
  • Creating Interactive Plots

Encarta Labs Advantage

  • One Stop Corporate Training Solution Providers for over 4,000 Modules on a variety of subjects
  • All courses are delivered by Industry Veterans
  • Get jumpstarted from newbie to production ready in a matter of few days
  • Trained more than 50,000 Corporate executives across the Globe
  • All our trainings are conducted in workshop mode with more focus on hands-on sessions

View our other course offerings by visiting http://encartalabs.com/course-catalogue-all.php

Contact us for delivering this course as a public/open-house workshop/online training for a group of 10+ candidates.

Top