EncartaLabs

Cloudera Search

( Duration: 5 Days )

This Cloudera Search training course provides skills to index data in Hadoop for more powerful real-time queries. You will learn to get more value from their data by integrating Cloudera Search with external applications.

By attending Cloudera Search workshop, delegates will learn:

  • Performing batch indexing of data stored in HDFS and HBase
  • Indexing streaming data in near-real-time with Flume
  • How to index content in multiple languages and file formats
  • Processing and transforming incoming data with Morphlines
  • Creating a user interface for an index using Hue
  • Integrating Cloudera Search with external applications
  • Improving the experience using faceting, highlighting, and spelling correction

  • Basic familiarity with Hadoop and experience programming in a general-purpose language such as Java, C, C++, Perl, or Python. You should be comfortable with the Linux command line and should be able to perform basic tasks such as creating and removing directories, viewing and changing file permissions, executing scripts, and examining file output. No prior experience with Apache Solr or Cloudera Search is required, nor is any experience with HBase or SQL.

The Cloudera Search class is ideal for:

  • Developers and data engineers

COURSE AGENDA

1

Performing Basic Queries

  • Executing a Query in the Admin UI
  • Basic Syntax
  • Techniques for Approximate Matching
  • Controlling Output
2

Writing More Powerful Queries

  • Relevancy and Filters
  • Query Parsers
  • Functions
  • Geospatial Search
  • Faceting
3

Preparing to Index Documents

  • Overview of the Indexing Process
  • Understanding Morphlines
  • Generating Configuration Files
  • Schema Design
  • Collection Management
4

Batch Indexing HDFS Data with MapReduce

  • Overview of the HDFS Batch Indexing Process
  • Using the MapReduce Indexing Tool
  • Testing and Troubleshooting
5

Near-Real-Time Indexing with Flume

  • Overview of the Near-Real-Time Indexing Process
  • Introduction to Apache Flume
  • How to Perform Near-Real-Time Indexing with Flume
  • Testing and Troubleshooting
6

Indexing HBase Data with Lily

  • What is Apache HBase?
  • Batch Indexing for HBase
  • Indexing HBase Tables in Near-Real-Time
7

Indexing Data in Other Languages and Formats

  • Field Types and Analyzer Chains
  • Word Stemming, Character Mapping, and Language Support
  • Schema and Analysis Support in the Admin UI
  • Metadata and Content Extraction with Apache Tika
  • Indexing Binary File Types with SolrCell
8

Improving Search Quality and Performance

  • Delivering Relevant Results
  • Helping Users Find Information
  • Query Performance and Troubleshooting
9

Building User Interfaces for Search

  • Search UI Overview
  • Building a User Interface with Hue
  • Integrating Search into Custom Applications
10

Considerations for Deployment

  • Planning for Deployment
  • Determining Hardware Needs
  • Security Overview
  • Collection Aliasing

Encarta Labs Advantage

  • One Stop Corporate Training Solution Providers for over 4,000 Modules on a variety of subjects
  • All courses are delivered by Industry Veterans
  • Get jumpstarted from newbie to production ready in a matter of few days
  • Trained more than 50,000 Corporate executives across the Globe
  • All our trainings are conducted in workshop mode with more focus on hands-on sessions

View our other course offerings by visiting http://encartalabs.com/course-catalogue-all.php

Contact us for delivering this course as a public/open-house workshop/online training for a group of 10+ candidates.

Top