Call : (+91) 968636 4243
Mail : info@EncartaLabs.com
EncartaLabs

Feature Engineering and Data Preparation for Analytics

( Duration: 3 Days )

This Feature Engineering and Data Preparation for Analytics training course introduces programming techniques to craft and feature engineer meaningful inputs to improve predictive modeling performance. In addition, this course provides strategies to preemptively spot and avoid common pitfalls that compromise the integrity of the data being used to build a predictive model. This course relies heavily on SAS programming techniques to accomplish the desired objectives.

By attending Feature Engineering and Data Preparation for Analytics workshop, delegates will learn to:

  • Extract data from a relational data table structure.
  • Define population qualifications and create a target sample.
  • Use feature engineering techniques to transform transactional data into meaningful inputs into a predictive model.
  • Transform low-, mid-, and high-cardinality categorical input variables into meaningful predictive modeling inputs.
  • Use ZIP codes and latitude/longitude points to calculate great-circle distance, driving distance, and estimated driving time.
  • Use Bayes' theorem to estimate meaningful predictive modeling inputs, impute missing observations, and partition the target sample into training and validation data sets for honest assessment of the predictive model.

  • Exposure to DATA step programming equivalent to the SAS Programming - Essentials course
  • Exposure to programming in SQL or the SQL procedure
  • Exposure to querying data in PROC SQL and building and deploying a predictive model
  • Familiarity with the analytical process of building predictive models and scoring new data

The Feature Engineering and Data Preparation for Analytics class is ideal for:

  • Analysts, data scientists, and IT professionals looking to craft better inputs to improve predictive modeling performance

COURSE AGENDA

1

Extracting Relevant Data

  • Data difficulties
  • Assessing available data
  • Accessing available data
  • Drawing a representative target sample
  • Drawing an uncontaminated input sample
2

Transforming Transaction and Event Data

  • Advantages and disadvantages of transactions data
  • Common transaction structures
  • Defining the time horizon
  • Fixed and variable time horizon methods
  • Implementing common transaction transformations
3

Using Nonnumeric Data

  • Definitions and difficulties of nonnumeric data
  • Miscoding and multicoding detection
  • Controlling degrees of freedom
  • Geocoding
4

Managing Data Pathologies

  • Exploring input variable distributions
  • Detecting data anomalies
  • Creating custom exploratory tools for candidate input variables
  • Missing value imputation
  • Data partitioning

Encarta Labs Advantage

  • One Stop Corporate Training Solution Providers for over 6,000 various courses on a variety of subjects
  • All courses are delivered by Industry Veterans
  • Get jumpstarted from newbie to production ready in a matter of few days
  • Trained more than 50,000 Corporate executives across the Globe
  • All our trainings are conducted in workshop mode with more focus on hands-on sessions

View our other course offerings by visiting https://www.encartalabs.com/course-catalogue-all.php

Contact us for delivering this course as a public/open-house workshop/online training for a group of 10+ candidates.

Top
Notice
X