Big Data & Hadoop

• Instructor led, training program with practical sessions, designed to augment your Hadoop skills in the field of Big Data analytics & Hadoop development.

• The systematic course, designed for professionals endeavoring to make a career in Big Data Analytics and Development using Hadoop ecosystem.

• Along with practical assignments at the end of each module, you will also work on a project towards the end of the course which involves setting up Hadoop Development Environment and gaining practical hands on knowledge of the Hadoop ecosystem components which you have learned during the course.

• This will facilitate validating and verifying your understanding about all the Hadoop components and entrust a confidence of working on Hadoop on your own.

• There is no strict prerequisite to start learning Hadoop.

• However, knowledge of Core Java will add support to grasp some of the Hadoop concepts, like MapReduce programming.

• Also, basic knowledge of Linux shell commands will make it easier to learn and execute Hadoop commands.

• Although Hadoop can now run on Windows, but originally it was built to run on Linux and hence, Linux is preferred for Hadoop installation and management.

• You can acquire the basic Java and Linux knowledge in parallel with this 'Big Data and Hadoop' training by spending few extra hours. Our trainers will be happy to help you with that.

Module 1 – Introduction to Big Data and Hadoop Architecture

• What is Big Data and the challenges associated with it?

• Limitations of traditional systems

• Core components of Hadoop ecosystem

• Hadoop Architecture

• Understanding the Hadoop File System (HDFS)

• NameNode, DataNode, Secondary Node

• JobTracker, TaskTracker

• Analyzing the HDFS Read and Write

Module 2 – Hadoop Cluster Configuration – Installation and setup

• Hadoop Cluster Deployment Modes - Standalone, Single node, Multinode

• Configuration files associated with Hadoop Cluster

• Practicing Hadoop Shell commands and Linux commands

• Introduction to MapReduce job processing and execution

• Demo VM setup and installation

• Practical Hands On: HDFS data loading

Module 3 - Hadoop MapReduce framework

• Overview of the MapReduce Framework

• MapReduce Architecture

• Data Types in Hadoop

• Mapper and Reducer

• Paradigms and components of MapReduce programming - Mapper, Reducer & Driver

• Combiners and Partitioners

• Sample MapReduce use cases

• Practical Hands On: Word Count Program in MapReduce.

Module 4 – Advance MapReduce

• MapReduce Input Format and OutputFormat

• Custom Input Format

• Counters

• MapReduce testing with JUnit and MRUnit Testing Frameworks

• Advance MapReduce programming with error handling

• Practical Hands On: Writing a MapReduce code with combiner and partitioner and testing it with MR Unit.

Module 5 - ApacheTM Pig and Pig Latin

• PIG components

• PIG Data types

• PIG Architecture

• PIG Latin: Load & Store

• PIG Latin Join, Group and Union

• Writing and testing Pig Latin Scripts

• PIG Latin: UDF

• Practical Hands On: Developing and executing Pig scripts.

Module 6 - Apache HiveTM AND Hive Query Language (HIVEQL)

• Hive Architecture, components and Installation

• Hive Vs Traditional RDBMS

• Hive DDL – Create, Show and Drop Hive Databases and tables

• Hive DML - Inserting data and load files into Hive Tables, Alter Tables

• HiveQL - Select, Join, Group By, Order By, Filter etc.

• Practical Hands On: Hive Queries and Scripts.

Module 7 – Advance Hive

• Hive SerDe

• Hive UDF

• Hive UDAF

• MapReduce Scripts,

• Joins & Subqueries

• Query optimization – Map side joins and Reduce side joins

• Hive Parameters

• Practical Hands On: Hive UDF, UDAF coding and execution.

Module 8 – NoSQL Databases and Apache HBaseTM

• Introduction to NoSQL

• Difference between RDBMS and NoSQL

• HBase Overview

• HBase Architecture & Features

• HBase Column Families

• HBase Master

• HBase Schema Design

• HBase API

• Practical Hands On: HBase Queries.

Module 9 - Apache Sqoop, Flume, Oozie, Zookeeper

• What is Sqoop?

• Import/Export Data using Sqoop

• Sqoop Architecture

• What is Flume? How it works?

• Flume Flow

• Oozie

• Oozie Work Flow

• Components of Oozie

• Oozie Scheduler

• What is Zookeeper? and its use

• Zookeeper API and Data Model

• Security

• Use cases

Module 10 - Hadoop 2.0, MRv2 and YARN

• Limitations of Hadoop 1.0

• Hadoop 2.0 Features

• HDFS 2 - Architecture

• NameNode High availability

• YARN Framework

• YARN Capacity Scheduler

• MRv1 and MRv2

• Practical Hands On: Programming in YARN.

Module 11 – Project Work

• Analyzing Twitter Data: Download live twitter data and load in to HDFS using Flume. Use Hive, MapReduce to prepare insights from the downloaded data. Use Oozie to manage the work flow and schedule tasks.

• Data analysis using NYSE Data Set: Load NYSE data to HDFS using Sqoop and use Pig & Hive to perform aggregations and prepare data analysis.

• Guidance to work a sample data set of your choice and building data analysis and insights on it.

With our extensive training methodologies and proper guidance from experienced trainers we provide hands on practical & theoretical knowledge along with the project work to make you industry ready and help you stand confidently in the Big Data and Hadoop market.

After the end of the course we also provide placement assistance to our students. Our staff and trainers assist you with the following:
• Edit your resume to showcase your Hadoop skills
• Interview Questions
• Guidance and Mock Interview with a working industry professional
• Guidance with reference to vacancies and recruiting companies

Currently Hadoop is one of the top job trends.

The knowledge of Hadoop opens new doors for the professionals to work in a wide range of roles and positions like-
• Big Data Project manager
• Product Manager
• Hadoop Developer
• Hadoop Tester
• Hadoop Engineer
• Trainers
• Data Analyst
• Data Tester
• Consultant and so on .

Top companies like Google, Facebook, Twitter, IBM, Dell, Intel, Oracle, Amazon, Yahoo, eBay, Microsoft have incorporated Hadoop and are looking for Hadoop professionals.

Smart IT professionals should take up Hadoop instantly for the brighter career.

60Hrs (Lectures + Practical Sessions)

20Hrs (Project Work).


Core Java & Adv Java
C & C++ programming

Apply Online