Book Your Seat Today!

Kindly advise me your company detail and our consultant will contact you soonest!

Course Objectives

  • Identify techniques for processing unstructured data
  • Transform textual data into a structured format
  • Apply different statistical text-processing method
  • Perform text classification and text clustering
  • Work on popular tasks like sentiment analysis or opinion mining


This two days course is an introduction into knowledge discovery using unstructured data like text documents, web and social media contents. It focuses on the necessary preprocessing steps and the most successful methods for automatic text classification including: Naive Bayes, Support Vector Machines (SVM), and clustering. Upon completion of this course, participants will have a solid understanding of typical text mining workflows and be able to identify techniques for processing unstructured data, apply different statistical text-processing methods, and perform content classification & clustering.

Target Audience

Advanced Analysts, Developer, Data Scientists and Administrators

Training Outline

Loading of Texts
  • Loading from Flat Files
  • Loading from Data Sets
  • Loading from Web Sources (e.g. URL crawling, Twitter)
  • Text Processing
  • Documents
  • Tokens
  • Visualizing Documents and Tokens
  • Multi-Dimensional Visualizations
Handling Unstructured Data
  • Preprocessing of Textual Data
  • Tokenizing
  • Stemming
  • Filtering of Tokens
  • Term Frequencies
  • Document Frequencies
  • TF-IDF
Advanced Modeling
  • Support Vector Machines
  • Naïve Bayes
  • K-NN
  • Text Clustering
Web Mining
  • Crawling the Web
  • Extracting Information from Web Sites
  • Transforming Web Sites to Documents
  • Retrieving Structured Web Data
  • Data ETL and Pre-processing for Web Sourced Data
  • Enriching Data via Web Services
  • Using Third Party Web Mining Extensions


Basic knowledge of computer programs and mathematics as exercises will be carried out using RapidMiner Studio.