Developer for Spark and Hadoop

Available Upon Request
Book Your Seat Today!

Kindly advise me your company detail and our consultant will contact you soonest!

Course Objectives

This course is designed for developers and engineers who have programming experience, but prior knowledge of Hadoop and/or Spark is not required.

  • Apache Spark examples and hands-on exercises are presented in Scala and Python. The ability to program in one of those languages is required.
  • Basic familiarity with the Linux command line is assumed
  • Basic knowledge of SQL is helpful


Hands-on exercises take place on a live cluster, running in the cloud. A private cluster will be built for each student to use during the class. Through instructor-led discussion and interactive, hands-on exercises, participants will navigate the Hadoop ecosystem, learning how to

  • Distribute, store, and process data in a Hadoop cluster
  • Write, configure, and deploy Spark applications on a cluster
  • Use the Spark shell for interactive data analysis
  • Process and query structured data using Spark SQL
  • Use Spark Streaming to process a live data stream

Training Outline

Day 1
  • Introduction

  • Introduction to Apache Hadoop and the Hadoop Ecosystem

  • Apache Hadoop File Storage

  • Distributed Processing on an Apache Hadoop Cluster

  • Apache Spark Basics

  • Working with Data Frames and Schemas

Day 2
  • Analyzing Data with DataFrame Queries

  • RDD Overview

  • Transforming Data with RDDs

  • Aggregating Data with Pair RDDs

  • Querying Tables and Views with Apache Spark SQL

Day 3
  • Working with Datasets in Scala

  • Writing, Configuring and Running Apache Spark Applications

  • Distributed Processing

  • Distributed Data Persistence

  • Common Patterns in Apache Spark Data Processing

Day 4
  • Apache Spark Streaming: Introduction to DStreams

  • Apache Spark Streaming: Processing Multiple Batches

  • Apache Sparks Streaming: Data Sources

  • Conclusion

    • Message Processing with Apache Kafka

    • Capturing Data with Apache Flume

    • Integrating Apache Flume and Apache Kafka

    • Importing Relational Data with Apache Sqoop

    • Final Questions and Post-Course Survey