Mastering Big Data Analytics

Companies today are using Big Data Analytics to identify trends, make faster decisions, and ultimately gain that edge to set themselves apart. This course will teach you the tools needed to implement Big Data in the Analytics universe.

About the course

Today, we’re surrounded by data. People upload videos, take pictures on their cell phones, text friends, update their Facebook status, leave comments around the web, click on ads, and so forth. Machines, too, are generating and keeping more and more data. To process such large datasets, there is a need for specialized tools.

This course covers two important frameworks Hadoop and Spark, which provide some of the most important tools to carry out enormous big data tasks.The first module of the course will start with the introduction to Big data and soon will advance into big data ecosystem tools and technologies like HDFS, YARN, MapReduce, Hive, etc.

In the second module, the course will take you through an introduction to spark and then dive into Scala and Spark concepts like RDD, transformations, actions, persistence and deploying Spark applications. The course also covers Spark Streaming and Kafka, various data formats like JSON, XML, Avro, Parquet and Protocol Buffers.

Skills you will gain

  • Map reduce
  • HDFS
  • YARN
  • Hive
  • Apache Hadoop
  • Spark and advanced spark
  • Pyspark
  • Kafka
  • Kafka with twitter analysis
  • Spark streaming
  • Spark SQL
  • Spark MLIB

Course Syllabus

Module 1

Hadoop : Master your Big data

6:30 hr

2 MCQs
  • Big data touch
  • Getting started: Hadoop
  • Hadoop framework : Stepping into Hadoop
  • HDFS: What and Why?
  • Working on HDFS
  • Hadoop 2.x - YARN
  • Mapreduce: A Programming paradigm
  • Closer look to Map reduce
  • Practical approach to Map reduce
  • Hadoop 1.x vs Hadoop 2.x
  • Hadoop 3.x
  • Hadoop 3.0 Installation Part 1
  • Hadoop 3.0 Installation Part 2
  • Hadoop 3.0 and Apache Spark Installation
  • What is new in Hadoop 3.0 Yarn Part 1
  • What is new in Hadoop 3.0 Yarn Part 2
  • What is new in Hadoop 3.0 HDFS
  • Show more

Module 2

Hive: Big data SQL

3:00 hr

2 MCQs
  • Apache hive : Teasing the Honey bee
  • Hive illustration : Basics
  • Hive illustration : External table in hive
  • Hive illustration : Loading different file formats
  • Hive illustration : Loading data into Hive tables
  • Hive illustration : Simple Operations on Hive table
  • Hive illustration : Query Operations on Hive table
  • Hive illustration : Querying complex structures
  • Hive illustration : Views
  • Show more

Module 3

Spark : Stream and analyze the big data

6:30 hr

2 MCQs
  • Getting started - Spark Basics
  • Spark and Hadoop - Face to face
  • Spark - Architecture
  • RDDs - Building blocks of Spark
  • RDDs continued
  • Spark Terminologies
  • Pyspark - Getting hands dirty
  • Spark - MLIB
  • Pyspark - Clustering
  • Music data - Study the case - 01
  • Music data - Study the case - 02
  • Music data - Study the case - 03
  • Spark streaming and Real time data analytics
  • Spark streaming Architecture
  • RTA - Get it with Twitter demo
  • Case study - Ad tech - 01
  • Case study - Ad tech - 02
  • Show more

Module 4

Apache Kafka - A distributed streaming platform

1:00 hr

2 MCQs
  • Kafka - What and Where?
  • Kafka - Key components_Broker_Producer
  • Kafka - Key components_Topics__Partitions
  • Kafka - Key components_Consumer_Replicas
  • Kafka - APIs and Clusters
  • More fun with Kafka
  • Zookeeper - Basic principles
  • Live Kafka demo with Twitter
  • Show more

Module 5

Advanced Spark

1:30 hr

2 MCQs
  • Configure the Spark
  • Spark Properties
  • Performance Tuning
  • Data serialization
  • Memory tuning
  • Garbage collection
  • Memory usage and levels of parallelism
  • Data locality and broadcasting
  • Job scheduling
  • Modes in cluster management
  • Dynamic resource allocation
  • Decommission of executors
  • Application schedule
  • Show more

Projects

Yellow Taxi trip analysis using Hive

Sentiment Analysis on Twitter in Real Time

GL Gurus

Vinod Raju

Data Scientist, Great learning

Sajan Kedia

Data Scientist at Myntra

Course certificate

Get Machine learning in Big data course certificate from Great learning which you can share in the Certifications section of your LinkedIn profile, on printed resumes, CVs, or other documents.

Special ePortfolio Mention

The Projects and Skills acquired as a part of this course will have a special place in your ePortfolio.

FAQs

How soon after signing up would I get access to the Learning Content?

The course content access will be provided to you 6 months after you enroll in the course.

For what duration will I have access to the course?

The course will be available to you for a period of 3 years. You can revisit the course content anytime you want.

Whom can I contact if I have queries regarding the course?

Once you enroll, you will have access to a course discussion forum where you can post all your queries and they will be answered by GL Gurus.

What will be the mode of training for this course?

This course will be completely online and will provide you with access to high-quality content videos, quizzes, case studies, and real-life projects. There will be a discussion forum where our GL Gurus will be available for answering your queries.

Will the training and course material help me prepare for the Big Data Hadoop certification exam?

Yes, Great Learning’s training and course materials will help you prepare for the Big Data Hadoop certification exam.