EncartaLabs

Text Analytics and Natural Language Processing (NLP) with R

( Duration: 4 Days )

The Text Analytics and Natural Language Processing (NLP) with R training course covers methods for ingesting text data from a variety of sources such as plain text files, pdfs, or the web, and then processing that data using the latest natural language processing and deep learning techniques.

By attending Text Analytics and Natural Language Processing (NLP) with R workshop, delegates will learn to:

  • Import text data from a variety of source formats
  • Tokenize text data to meaningful units
  • Wrangle text data using specific textual functions
  • Compute aggregating measures on tokenized data
  • Translate between text data formats
  • Complete a sentiment analysis
  • Perform document classification
  • Perform topic modeling
  • Built a simple neural network appropriate for NLP modeling

Working knowledge of the R language, RStudio, and the dplyr/tidyverse packages.

COURSE AGENDA

1

Working with unstructured text data

  • string methods
  • regex
  • reading in text files
  • review of base (R/Python)
2

Importing

  • parsing data from a text file
  • importing it into a tidy structure
  • parsing data from a pdf
    • From a "pile of pdfs"
  • scraping data from the web
  • Discussion of other methods
    • OCR
    • Handwriting recognition
3

Managing Text Data 1

  • a tidy text format
  • Overview of text data formats
    • tidy text
    • token list
    • Bag of words
    • document term matrix or document frequency matrix (dfm/dt)
    • corpus
    • docvars
  • associated formats
    • stop words
    • Sentiment lexica
    • word vectors / models
4

Managing Text Data 2

  • tokenizing text
  • units of tokenization
    • tokens
    • lemma
    • stems
    • n-grams
    • sentences
    • Tweets
  • Tf-idf
  • Log-odds (tidylo)
5

Sentiment Analysis

  • Sentiment lexica
  • Sentiment analysis with inner_join
  • Analyzing by other units
  • Valence shifting
  • VADER
6

Document Classification

  • Text similarity - stringiest
    • Cosine
    • Edit distance
  • Machine Learning for document classification
    • Naive Bayes model
7

Modeling / Document Clustering

  • LDA
  • stm
8

Text and Deep Learning

  • Deep learning introduction
  • Architecture of neural networks
  • Tensorflow + keras
  • Word vectors
    • word2vec
    • Text2vec
    • GloVe
    • Spacy
  • Combining Deep Learning and NLP
    • CNN
    • RNN
    • LSTM
  • Named Entity Recognition (NER)
  • Part of Speech tagging (POS)
  • Dependency Parsing

Encarta Labs Advantage

  • One Stop Corporate Training Solution Providers for over 4,000 Modules on a variety of subjects
  • All courses are delivered by Industry Veterans
  • Get jumpstarted from newbie to production ready in a matter of few days
  • Trained more than 50,000 Corporate executives across the Globe
  • All our trainings are conducted in workshop mode with more focus on hands-on sessions

View our other course offerings by visiting http://encartalabs.com/course-catalogue-all.php

Contact us for delivering this course as a public/open-house workshop/online training for a group of 10+ candidates.

Top