`Live Courses by Practice Area Analytics & BI

Hadoop Developer Foundations

* Looking for a flexible schedule (after hours or weekends)? Please call 858-208-4141 or email us: sales@ccslearningacademy.com.

Student financing options are available.

Transitioning military and Veterans, please contact us to sign up for a free consultation on training and hiring options.

Looking for group training? Contact Us

gowranga

Last Update May 10, 2024

0 already enrolled

About This Course

Course Description

New – Learn about the Hadoop ecosystem and how to process large data streams.

Apache Hadoop is a framework for processing Big Data, and Spark is a new in-memory processing engine. This course will introduce you to the Hadoop ecosystem and Spark.

This course explores processing large data streams in the Hadoop ecosystem. Working in a hands-on learning environment, you’ll learn techniques and tools for ingesting, transforming, and exporting data to and from the Hadoop ecosystem for processing. You’ll also process data using Map/Reduce and other critical tools, including Hive and Pig. Towards the end of the course, we’ll review other useful tools such as Oozie and discuss security in the ecosystem.

Learning Objectives

Introduction to Hadoop

HDFS

YARN

Data Ingestion

HBase

Oozie

Working with Hive

Hive advanced

Hive in Cloudera/Hortonworks Distribution (or tools of choice)

Working with Spark

Spark Basics

Spark Shell

RDDs

Spark Dataframes and Datasets

Spark SQL

Spark API programming

Spark and Hadoop

Machine Learning (ML/MLlib)

GraphX

Spark Streaming

Inclusions

Instructor-led training
Training Seminar Student Handbook
Collaboration with classmates (not currently available for self-paced course)
Real-world learning activities and scenarios
Exam scheduling support*
Enjoy job placement assistance for the first 12 months after course completion.
This course is eligible for CCS Learning Academy’s Learn and Earn Program: get a tuition fee refund of up to 50% if you are placed in a job through CCS Global Tech’s Placement Division*
Government and Private pricing available.*

Pre-requisites

Familiar with a programming language
Comfortable in Linux environment (be able to navigate Linux command line, edit files using vi or nano)

Target Audience

Experienced Developers and Architects seeking to be proficient in Hadoop, Hive, and Spark within an enterprise data environment.

Curriculum

103 Lessons32h

1. Introduction to Hadoop

Hadoop history, concepts

Ecosystem

Distributions

High-level architecture

Hadoop myths

Hadoop challenges

Hardware and software

2. HDFS

3. YARN

4. HBase

5. Oozie

6. Working with Hive

7. Hive advanced

8. Hive in Cloudera or HortonWorks Distribution (or tools of choice)

9. Spark Basics

10. Spark Shell

11. RDDs

12. Spark SQL

13. Spark API programming (Scala and Python)

14. Spark and Hadoop

15. Machine Learning (ML/MLlib)

16. GraphX

17. Spark Streaming

Your Instructors

gowranga

0/5

132 Courses

0 Reviews

1 Student

Write a review

$2,395.00

Level

Intermediate

Duration 32 hours

Lectures

103 lectures

Subject

`Live Courses by Practice Area Analytics & BI

Inclusions

Instructor-led training
Training Seminar Student Handbook
Collaboration with classmates (not currently available for self-paced course)
Real-world learning activities and scenarios
Exam scheduling support*
Enjoy job placement assistance for the first 12 months after course completion.
This course is eligible for CCS Learning Academy’s Learn and Earn Program: get a tuition fee refund of up to 50% if you are placed in a job through CCS Global Tech’s Placement Division*
Government and Private pricing available.*

Hadoop Developer Foundations

About This Course

Course Description

Learning Objectives

Inclusions

Pre-requisites

Target Audience

Curriculum

1. Introduction to Hadoop

Hadoop history, concepts

Ecosystem

Distributions

High-level architecture

Hadoop myths

Hadoop challenges

Hardware and software

2. HDFS

Design and architecture

Concepts (horizontal scaling, replication, data locality, and rack awareness)

Daemons: Namenode, Secondary Namenode, and Datanode

Communications and heart-beats

Data integrity

Read and write path

Namenode High Availability (HA), Federation

3. YARN

YARN Concepts and architecture

Evolution from MapReduce to YARN

Data Ingestion

Flume for logs and other data ingestion into HDFS

Sqoop for importing from SQL databases to HDFS, as well as exporting back to SQL

Copying data between clusters (distcp)

Using S3 as complementary to HDFS

Data ingestion best practices and architectures

Oozie for scheduling events on Hadoop

4. HBase

Concepts and architecture

HBase vs RDBMS vs Cassandra

HBase Java API

Time series data on HBase

Schema design

5. Oozie

Introduction to Oozie

Features of Oozie

Oozie Workflow

Creating a MapReduce Workflow

Start, End, and Error Nodes

Parallel Fork and Join Nodes

Workflow Jobs Lifecycle

Workflow Notifications

Workflow Manager

Creating and Running a Workflow

Oozie Coordinator Sub-groups

Oozie Coordinator Components, Variables, and Parameters

6. Working with Hive

Architecture and design

Data types

SQL support in Hive

Creating Hive tables and querying

Partitions

Joins

Text processing

Labs: various labs on processing data with Hive

7. Hive advanced

Transformation and Aggregation

Working with Dates, Timestamps, and Arrays

Converting Strings to Date, Time, and Numbers

Create new Attributes, Mathematical Calculations, and Windowing Functions

Use Character and String Functions

Binning and Smoothing

Processing JSON Data

Execution Engines (Tez, MR, Spark)

8. Hive in Cloudera or HortonWorks Distribution (or tools of choice)

Impala architecture

Impala joins and other SQL specifics

9. Spark Basics

Big Data, Hadoop, and Spark

What’s new in Spark v2

Spark concepts and architecture

Spark ecosystem (core, spark sql, mlib, and streaming)

10. Spark Shell