Big Data and Hadoop

Course Features

Course Details

Big Data and Hadoop Training

Get industry-leading instruction in a professional online experience from a live senior instructor

Validate your skills as an Hadoop expert

myTectra prepares you to be an expert with highly valued skills Hadoop certification . Enrol Today!

Hadoop is open source, it’s too complex and too critical for you to wing it or depend on endless Googling for education. That’s why our Hadoop curriculum helps professionals develop a solid foundation in MapReduce and HDFS, while also providing opportunities to dive deeper into brand-level ecosystems.

Request more info..

Training Features

Instructor-led Sessions

36 Hours of Online Live Instructor-Led Classes. Weekend Class : 12 sessions of 3 hours each. Weekday Class : 18 sessions of 2 hours each.

Lifetime Access

You get lifetime access to Learning Management System (LMS) where presentations, quizzes, installation guide & class recordings are there.

Real-life Case Studies

Live project based on any of the selected use cases, involving implementation of the various Hadoop concepts.

24 x 7 Expert Support

We have 24x7 online support team to resolve all your technical queries, through ticket based tracking system, for the lifetime.


Each class will be followed by practical assignments which can be completed before the next class.


Towards the end of the course, you will be given access to online Test. Iteanz certifies you as an Hadoop Expert based on the scoring of 60% or above.

Course Outline

Chapter 1:Introduction

  • Big Data
  • Limitations and Solutions of existing Data Analytics Architecture
  • Hadoop
  • Hadoop Features
  • Hadoop Ecosystem
  • Hadoop 2.x core components
  • Hadoop Storage: HDFS
  • Hadoop Processing: MapReduce Framework
  • Hadoop Different Distributions

Chapter 2:Hadoop Architecture and HDFS

  • Hadoop 2.x Cluster Architecture - Federation and High Availability
  • A Typical Production Hadoop Cluster
  • Hadoop Cluster Modes
  • Common Hadoop Shell Commands
  • Hadoop 2.x Configuration Files
  • Single node cluster and Multi node cluster set up Hadoop Administration

Chapter 3: Hadoop MapReduce Framework

  • MapReduce Use Cases
  • Traditional way Vs MapReduce way
  • Why MapReduce
  • Hadoop 2.x MapReduce Architecture
  • Hadoop 2.x MapReduce Components
  • YARN MR Application Execution Flow
  • YARN Workflow
  • Anatomy of MapReduce Program
  • Demo on MapReduce
  • Input Splits
  • Relation between Input Splits and HDFS Blocks
  • MapReduce: Combiner & Partitioner
  • Demo on de-identifying Health Care Data set
  • Demo on Weather Data set

Chapter 4: Advanced MapReduce

  • Counters
  • Distributed Cache
  • MRunit
  • Reduce Join
  • Custom Input Format
  • Sequence Input Format
  • Xml file Parsing using MapReduce

Chapter 5: Pig

  • About Pig
  • MapReduce Vs Pig
  • Pig Use Cases
  • Programming Structure in Pig
  • Pig Running Modes
  • Pig components
  • Pig Execution
  • Pig Latin Program
  • Data Models in Pig
  • Pig Data Types
  • Shell and Utility Commands
  • Pig Latin : Relational Operators
  • File Loaders, Group Operator
  • COGROUP Operator
  • Joins and COGROUP
  • Union
  • Diagnostic Operators
  • Specialized joins in Pig
  • Built In Functions ( Eval Function, Load and Store Functions, Math function, String Function, Date Function, Pig UDF, Piggybank, Parameter Substitution ( PIG macros and Pig Parameter substitution )
  • Pig Streaming
  • Testing Pig scripts with Punit
  • Aviation use case in PIG
  • Pig Demo on Healthcare Data set

Chapter 6:Hive

  • Hive Background
  • Hive Use Case
  • About Hive
  • Hive Vs Pig
  • HiveArchitecture and Components
  • Metastore in Hive
  • Limitations of Hive
  • Comparison with Traditional Database
  • Hive Data Types and Data Models
  • Partitions and Buckets
  • Hive Tables(Managed Tables and External Tables)
  • Importing Data
  • Querying Data
  • Managing Output
  • Hive Script
  • Hive UDF
  • Retail use case in Hive
  • Hive Demo on Healthcare Data set

Chapter 7:Advanced Hive and HBase

  • Hive QL: Joining Tables
  • Dynamic Partitioning
  • Custom Map/Reduce Scripts
  • Hive Indexes and views Hive query optimizers
  • Hive : Thrift Server, User Defined Functions
  • HBase: Introduction to NoSQL Databases and HBase
  • HBase v/s RDBMS
  • HBase Components
  • HBase Architecture
  • Run Modes & Configuration
  • HBase Cluster Deployment

Chapter 8:Advanced HBase

  • HBase Data Model
  • HBase Shell
  • HBase Client API
  • Data Loading Techniques
  • ZooKeeper Data Model
  • Zookeeper Service
  • Zookeeper
  • Demos on Bulk Loading
  • Getting and Inserting Data
  • Filters in HBase

Chapter 9:Processing Distributed Data with Apache Spark

  • What is Apache Spark
  • Spark Ecosystem
  • Spark Components
  • History of Spark and Spark Versions/Releases
  • Spark a Polyglot
  • What is Scala?
  • Why Scala?
  • SparkContext
  • RDD

Chapter 10:Oozie and Hadoop Project

  • Flume and Sqoop Demo
  • Oozie
  • Oozie Components
  • Oozie Workflow
  • Scheduling with Oozie
  • Demo on Oozie Workflow
  • Oozie Co-ordinator
  • Oozie Commands
  • Oozie Web Console
  • Oozie for MapReduce
  • PIG
  • Hive, and Sqoop
  • Combine flow of MR
  • PIG
  • Hive in Oozie
  • Hadoop Project Demo
  • Hadoop Integration with Talend

Request more information

More About Hadoop

7 Reasons Why Java Developers Should Learn Hadoop

Imagine there are two girls standing in front of you – The first girl is cute, beautiful, interesting and has the smile that any guy would die for.

Read more

What is Big Data ?


1 Star2 Stars3 Stars4 Stars5 Stars (No Ratings Yet)


This course does not have any sections.

More Courses by this Instructor