Talend Big Data Basics

17 September 2019 | Kuala Lumpur
Book Your Seat Today!

Kindly advise me your company detail and our consultant will contact you soonest!

Course Objectives

After completing this course, you will be able to:
• Create cluster metadata manually, from configuration files, or automatically
• Create HDFS and Hive metadata
• Connect to your cluster to use HDFS, HBase, Hive, Pig, Sqoop, and MapReduce
• Read data from and write it to HDFS (HDFS, HBase)
• Read tables from and write them to HDFS (Hive, Sqoop)
• Process tables stored in HDFS with Hive
• Process data stored in HDFS with Pig
• Process data stored in HDFS with Big Data batch Jobs

Duration

2 Days

Target Audience

Anyone who wants to use Talend Studio to interact with Big Data systems

Training Outline

Basic Concepts
  • Opening a project
  • Monitoring the Hadoop cluster
  • Creating cluster metadata manually
  • Creating cluster metadata from Hadoop configuration files
  • Creating cluster metadata using a wizard
Reading and Writing Data in HDFS
  • Storing a file in HDFS
  • Storing multiple files in HDFS
  • Reading data from HDFS
  • Storing sparse datasets with HBase
Working with Tables
  • Importing tables with Sqoop
  • Creating tables with Hive
Processing data and tables in HDFS
  • Processing Hive tables with Jobs
  • Profiling Hive tables (optional)
  • Processing data with Pig
  • Processing data with a Big Data batch Job
  • Migrating a standard Job to a batch Job
Clickstream Use Case
  • Clickstream use case: resource management with YARN
  • Setting up a development environment
  • Loading data files onto HDFS
  • Enriching logs
  • Computing statistics
  • Understanding MapReduce Jobs
  • Using Talend Studio to configure a resource request to YARN

Prerequisite

Completion of Introduction to Talend Studio, Talend Data Integration Basics, or Talend Data Integration Advanced