**The Power of Predictive Analytics**

##### Book Your Seat Today!

Kindly advise me your company detail and our consultant will contact you soonest!

Kindly advise me your company detail and our consultant will contact you soonest!

This five-days course prepare analyst to take the knowledge gained and apply it to their own respective data mining problems, solving them quickly and easily. The lessons learnt will be applicable to areas such as customer analytics, targeted marketing, social media analytics, fraud detection, predictive maintenance, resource management, etc. This course also cover is an introduction into knowledge discovery using unstructured data like text documents, web and social media contents. It focuses on the necessary preprocessing steps and the most successful methods for automatic text classification including: Naive Bayes, Support Vector Machines (SVM), and clustering. Upon completion of this course, participants will have a solid understanding of typical text mining workflows and be able to identify techniques for processing unstructured data, apply different statistical text-processing methods, and perform content classification & clustering. This course is suggested for analysts and data scientists.

At the end of this course, you’ll able to:

- Perform all common data preparations
- Build sophisticated predictive models
- Evaluate model quality with respect to different criteria
- Deploy analytical predictive models
- Utilize more complex functionality of RapidMiner Studio
- Apply more sophisticated analytical approaches
- Identify techniques for processing unstructured data
- Transform textual data into a structured format
- Apply different statistical text-processing methods
- Perform text classification and text clustering

Foundations

**Introduction Lecture**

- Cover brief overview of RapidMiner product ecosystem, and overview of the analytics hierarchy and analytics project management process CRISP-DM.

**Data Loading**

- Basic RapidMiner Interface Navigation
- Create Repositories & Folders
- Load Data functionality
- Read Excel operator
- Store

**ETL**

- Understanding the ETL process
- Filter operator
- Map operator
- Replace Missing Values operator
- Data Type Conversion (Date to Numerical & Numerical to Polynominal)
- Generate Attributes
- Set Roles
- Select Attributes

**Introduction to Machine Learning in RapidMiner**

- What is Machine Learning? What is Supervised Learning vs Unsupervised Learning?
- Bias vs Variance Tradeoff
- Training vs Testing Error
- k-NN (nearest neighbor) algorithm & distance measures
- Splitting data for Training & Testing
- Building Model on Training Data
- Applying Model on Test Data

**Validation and Performance measurement**

- Split validation
- Cross validation
- Performance measurements
- Explaining Confusion Matrix

**Normalizing and Sampling**

- Normalization in Cross-Validation
- Group Models

**Optimizing Parameters**

- Grid Optimization
- Adjusting K-value automatically
- Determining accuracy
- The Log Operator
- Plotting K-value vs. accuracy

**Linear Regression**

- Introduction to Linear Regression
- Nominal to Numerical Review
- Linear Regression operator
- Performance (Regression) operator options
- Reviewing a linear regression model

**Naïve Bayes**

- Understanding Naïve Bayes
- Speed discussions for k-NN and Naïve Bayes

**The Decision Tree**

- Understand the Decision Tree
- Understand pruning in the context of overfitting

**Workshop – The Random Forest**

- What is a Random Forest
- Random Forest methodology explained
- Pros/Cons of ensemble learning

Advanced

**Advanced Data Loading**

- Loading multiple data files with Loop File operator
- Introduction of Regex for selections
- Introduction of macros
- Disparate file formats
- Impact of delimiters on parameters

**Advanced ETL Part 1**

- Join data
- Trim
- Remove Duplicates
- Map data
- Rename attributes

**Advanced ETL Part 2**

- Data Aggregation & Pivoting
- Using SQL/Join operators
- Replace Missing Values
- Set Role operation (Ids & Labels)
- Set Minus & Append operators

**Feature Generation**

- Generate attributes
- Set Role – ids and labels
- Format Numbers and advanced data type conversions
- Functions and Mathematical Expressions
- Advanced Select attributes

**Other Data Transformations**

- Generate Aggregation
- Rename by Replace
- Loop Attributes operator

**Model Deployment & Performance Measurement**

- Model Training – Cross-Validation with Decision Tree
- Storing Models
- New Performance Measure – explaining ROC & AUC
- Model retrieval
- Scoring & results analysis

**Sampling and Weighting**

- Revisit k-Nearest Neighbors
- Downsampling and Upsampling in k-NN
- Extract Macro
- Generate Weights by Stratification

**The Neural Net**

- Introduction to Neural Net
- Nominal to Numerical conversion (again)
- Neural Net operator parameters

**Feature Selection & PCA**

- Introduction to Feature Selection
- Remove Correlated Attributes
- Forward Selection
- Backward Elimination

**SVM**

- Introduction to SVM
- SVM operator parameters
- Create and Apply Threshold
- Optimize SVM

Text Mining

**Introduction to Text Mining Concepts**

- Define what text mining is
- Learning popular use cases for text mining
- Understanding different data structures
- Outlining the basic text mining analytical process
- Installing the Text Mining extension

**Document Handling in RapidMiner**

- Understand the document in RapidMiner
- Creating documents in RapidMiner
- Reading documents from files
- Document collections
- Converting other datasets into documents
- Converting documents into datasets

**Text Data Preprocessing**

- The need for text preprocessing
- The Process Document operator
- Tokenize operator
- Transform Cases operator
- Filter Stopwords operator
- Stemming operator
- Extraction operators
- Wordlist to Data operator

**Text Data Processing**

- Vectorizing Text Data
- Term Frequency & TF-IDF
- Other Text Processing Options
- N-Grams
- Text Pruning

**Text Visualization**

- Filtering & Sorting
- Word Frequency Visualization
- Finding Discriminating Terms

**Predicting Review Ratings with k-NN**

- Challenges of modeling with text data
- K-NN and suitable distance measures
- Optimization to adjust k-value automatically
- The Log Operator
- Plotting K-value vs. performance

**Text Clustering**

- k-Means Clustering
- Viewing Cluster Outputs in RapidMiner
- Comparing Clustering to Labeled Data

Basic knowledge of computer programs and mathematics.