Data Science -Foundations & Advanced: Data Mining and Predictive Analytics

26 - 29 Mar 2019 | Kuala Lumpur 25 - 28 June 2019 | Kuala Lumpur
Book Your Seat Today!

Kindly advise me your company detail and our consultant will contact you soonest!

Course Objectives

This four days course prepare analyst to take the knowledge gained and apply it to their own respective data mining problems, solving them quickly and easily. The lessons learnt will be applicable to areas such as customer analytics, targeted marketing, social media analytics, fraud detection, predictive maintenance, resource management, etc.

What Will You Learn?

  • Perform all common data preparations
  • Build sophisticated predictive models
  • Evaluate model quality with respect to different criteria
  • Deploy analytical predictive models
  • Utilize more complex functionality of RapidMiner Studio
  • Apply more sophisticated analytical approaches

Target Audience

This course is suggested for analysts and data scientists.

Training Methodology

Hands-on exercise, lecture, group discussion, and case study.

Training Outline


  • Business scenario
  • Analytics Taxonomy & Hierarchy
  • CRISP-DM & Data mining in the enterprise
Basic Usage
  • User interface
  • Creating and Managing RapidMiner repositories
  • Operators and processes
  • Storing data, processes, and results sets
EDA: Exploratory Data Analysis
  •  Loading Data
  • Quick Summary Statistics
  • Visualizing Data & Basic Charting
Data Preparation
  • Basic Data ETL (Extract, Transform, and Load)
  • Data Types & Transformations of value types
  • Handling missing values
  • Handling attribute roles
  • Normalization and standardization
  • Filtering examples and attributes
Building Better Processes
  • Organizing
  • Renaming
  • Relative Path
  • Sub-processes
  • Building Blocks
  •  Breakpoints
Predictive Model’s Algorithms
  • K-Nearest Neighbour
  • Correlations
  • Naive Bayes
  • Linear Regression
  • Rules
  • Decision Trees
Model Construction and Evaluation
  • Machine Learning Theory: Bias, Variance, Overfitting & Underfitting
  • Split and Cross Validation
  • Applying models
  • Optimization and Parameter Tuning
  • Splitting data
  • Evaluation methods & Performance criteria
Additional Workshops
  • Outlier Detection
  • Random Forests
  • Ensemble Modeling


  • Business case
  • Intro course review
  • Loading new data
EDA: Exploratory Data Analysis
  • Multiple sources
  • Joins & Set Theory
  • Understanding new attributes
Data Preparation
  • Advanced Data ETL (Extract, Transform, and Load)
  • Aggregation & Multi-level aggregation
  • Pivot & De-Pivot
  • Calculated values
  • Regular Expressions
  • Changing value types
  • Feature Generation and Feature Engineering
  • Loops
  • Macros
Predictive Models Algorithms
  • Support Vector Machines
  • K-Means Clustering
  • Neural Networks
  • Logistic Regression
Model Construction and Evaluation
  • Advanced performance criteria
  • ROC plots
  • Comparison between models
  • Sampling
  • Weighting
  • Feature Selection: Forward Selection
  • Feature Selection: Backward Elimination
  • Validation of preprocessing and preprocessing models
  • Optimization & Logging results
Additional Workshops
  • Principal Components Analysis
  • Logistic Regression
  • Performance (Cost) Model Optimization


Basic knowledge of computer programs and mathematics.