The Power of Predictive Analytics

Available Upon Requesr
Book Your Seat Today!

Kindly advise me your company detail and our consultant will contact you soonest!

Course Objectives

This five-days course prepare analyst to take the knowledge gained and apply it to their own respective data mining problems, solving them quickly and easily. The lessons learnt will be applicable to areas such as customer analytics, targeted marketing, social media analytics, fraud detection, predictive maintenance, resource management, etc. This course also cover is an introduction into knowledge discovery using unstructured data like text documents, web and social media contents. It focuses on the necessary preprocessing steps and the most successful methods for automatic text classification including: Naive Bayes, Support Vector Machines (SVM), and clustering. Upon completion of this course, participants will have a solid understanding of typical text mining workflows and be able to identify techniques for processing unstructured data, apply different statistical text-processing methods, and perform content classification & clustering. This course is suggested for analysts and data scientists.

Description

At the end of this course, you’ll able to:

  • Perform all common data preparations
  • Build sophisticated predictive models
  • Evaluate model quality with respect to different criteria
  • Deploy analytical predictive models
  • Utilize more complex functionality of RapidMiner Studio
  • Apply more sophisticated analytical approaches
  • Identify techniques for processing unstructured data
  • Transform textual data into a structured format
  • Apply different statistical text-processing methods
  • Perform text classification and text clustering

Training Outline

Foundations

Introduction Lecture

  • Cover brief overview of RapidMiner product ecosystem, and overview of the analytics hierarchy and analytics project management process CRISP-DM.

Data Loading

  • Basic RapidMiner Interface Navigation
  • Create Repositories & Folders
  • Load Data functionality
  • Read Excel operator
  • Store

ETL

  • Understanding the ETL process
  • Filter operator
  • Map operator
  • Replace Missing Values operator
  • Data Type Conversion (Date to Numerical & Numerical to Polynominal)
  • Generate Attributes
  • Set Roles
  • Select Attributes

Introduction to Machine Learning in RapidMiner

  • What is Machine Learning? What is Supervised Learning vs Unsupervised Learning?
  • Bias vs Variance Tradeoff
  • Training vs Testing Error
  • k-NN (nearest neighbor) algorithm & distance measures
  • Splitting data for Training & Testing
  • Building Model on Training Data
  • Applying Model on Test Data

Validation and Performance measurement

  • Split validation
  • Cross validation
  • Performance measurements
  • Explaining Confusion Matrix

Normalizing and Sampling

  • Normalization in Cross-Validation
  • Group Models

Optimizing Parameters

  • Grid Optimization
  • Adjusting K-value automatically
  • Determining accuracy
  • The Log Operator
  • Plotting K-value vs. accuracy

Linear Regression

  • Introduction to Linear Regression
  • Nominal to Numerical Review
  • Linear Regression operator
  • Performance (Regression) operator options
  • Reviewing a linear regression model

Naïve Bayes

  • Understanding Naïve Bayes
  • Speed discussions for k-NN and Naïve Bayes

The Decision Tree

  • Understand the Decision Tree
  • Understand pruning in the context of overfitting

Workshop – The Random Forest

  • What is a Random Forest
  • Random Forest methodology explained
  • Pros/Cons of ensemble learning
Advanced

Advanced Data Loading

  • Loading multiple data files with Loop File operator
  • Introduction of Regex for selections
  • Introduction of macros
  • Disparate file formats
  • Impact of delimiters on parameters

Advanced ETL Part 1

  • Join data
  • Trim
  • Remove Duplicates
  • Map data
  • Rename attributes

Advanced ETL Part 2

  • Data Aggregation & Pivoting
  • Using SQL/Join operators
  • Replace Missing Values
  • Set Role operation (Ids & Labels)
  • Set Minus & Append operators

Feature Generation

  • Generate attributes
  • Set Role – ids and labels
  • Format Numbers and advanced data type conversions
  • Functions and Mathematical Expressions
  • Advanced Select attributes

Other Data Transformations

  • Generate Aggregation
  • Rename by Replace
  • Loop Attributes operator

Model Deployment & Performance Measurement

  • Model Training – Cross-Validation with Decision Tree
  • Storing Models
  • New Performance Measure – explaining ROC & AUC
  • Model retrieval
  • Scoring & results analysis

Sampling and Weighting

  • Revisit k-Nearest Neighbors
  • Downsampling and Upsampling in k-NN
  • Extract Macro
  • Generate Weights by Stratification

The Neural Net

  • Introduction to Neural Net
  • Nominal to Numerical conversion (again)
  • Neural Net operator parameters

Feature Selection & PCA

  • Introduction to Feature Selection
  • Remove Correlated Attributes
  • Forward Selection
  • Backward Elimination

SVM

  • Introduction to SVM
  • SVM operator parameters
  • Create and Apply Threshold
  • Optimize SVM
Text Mining

Introduction to Text Mining Concepts

  • Define what text mining is
  • Learning popular use cases for text mining
  • Understanding different data structures
  • Outlining the basic text mining analytical process
  • Installing the Text Mining extension

Document Handling in RapidMiner

  • Understand the document in RapidMiner
  • Creating documents in RapidMiner
  • Reading documents from files
  • Document collections
  • Converting other datasets into documents
  • Converting documents into datasets

Text Data Preprocessing

  • The need for text preprocessing
  • The Process Document operator
  • Tokenize operator
  • Transform Cases operator
  • Filter Stopwords operator
  • Stemming operator
  • Extraction operators
  • Wordlist to Data operator

Text Data Processing

  • Vectorizing Text Data
  • Term Frequency & TF-IDF
  • Other Text Processing Options
  • N-Grams
  • Text Pruning

Text Visualization

  • Filtering & Sorting
  • Word Frequency Visualization
  • Finding Discriminating Terms

Predicting Review Ratings with k-NN

  • Challenges of modeling with text data
  • K-NN and suitable distance measures
  • Optimization to adjust k-value automatically
  • The Log Operator
  • Plotting K-value vs. performance

Text Clustering

  • k-Means Clustering
  • Viewing Cluster Outputs in RapidMiner
  • Comparing Clustering to Labeled Data

Prerequisite

Basic knowledge of computer programs and mathematics.