Semester 6Year 3 · EvenCore Subject★★★ Moderate
CS 604

Big Data Analytics

BTech IT Semester 6 · Amity University Visakhapatnam, Visakhapatnam

Study of Hadoop, MapReduce, Spark, NoSQL databases, and big data processing frameworks.

This Big Data Analytics syllabus is mapped to the BTech Information Technology (BTech IT) curriculum followed at Amity University Visakhapatnam (AUV), a private institution in Visakhapatnam, accredited by NAAC A+ & NBA & AICTE. Students at AUV can use the unit-wise topics, PYQs and exam tips below to prepare for their Semester 6 CS 604 examination.

📚
4
Units
📝
28
Topics
4
Credits
⏱️
60h
Lecture hrs
💯
100
Max marks
Your Progress
0 / 28 topics
0% complete
Overview
🎯
Why it matters
Facebook processes petabytes of data. Netflix recommendations analyze billions of records. Big Data = big money. Companies pay premium for engineers who can handle massive datasets.
💼
Placement relevance
Data Engineer roles at FAANG. Analytics positions. Hadoop/Spark skills valued. Growing field with ₹20-45 LPA for big data specialists. Cloud companies need big data expertise.
🔗
Prerequisites for
Data Engineering · Data Science · Cloud Data Platforms · Stream Processing · Data Warehousing
📚
Recommended books
Hadoop: The Definitive Guide by Tom White · Learning Spark by Holden Karau · Big Data: Principles and Best Practices by Nathan Marz · MongoDB: The Definitive Guide by Shannon Bradshaw
Curriculum — 4 Units
U1
Unit 1 · 7 Topics · 0% complete
Big Data Basics
Key Formulae
MapReduce:Map(key, value) → Shuffle/Sort → Reduce(key, list<values>)
HDFS:NameNode (metadata) + DataNodes (blocks, replication factor 3)
3Vs (Volume, Velocity, Variety)
Big Data Characteristics (Veracity, Value)
Distributed Systems Concepts
Hadoop Ecosystem Overview
HDFS Architecture
MapReduce Programming Model
YARN (Resource Management)
U2
Unit 2 · 7 Topics · 0% complete
NoSQL Databases
Key Formulae
CAP:Consistency, Availability, Partition Tolerance (choose 2 of 3)
BASE:Basically Available, Soft state, Eventual consistency
CAP Theorem
BASE vs ACID
MongoDB (Document Store)
Cassandra (Column Store)
HBase (Column-Oriented)
Redis (Key-Value Store)
Graph Databases (Neo4j)
U3
Unit 3 · 7 Topics · 0% complete
Apache Spark
Key Formulae
RDD Operations:Lazy transformations + eager actions
DAG:Directed Acyclic Graph for execution optimization
RDDs (Resilient Distributed Datasets)
Transformations (map, filter, flatMap)
Actions (collect, count, reduce)
Spark SQL & DataFrames
Spark Streaming
MLlib (Machine Learning Library)
Spark vs MapReduce
U4
Unit 4 · 7 Topics · 0% complete
Data Analytics & Tools
Key Formulae
ETL:Extract → Transform → Load (data pipeline)
Lambda Architecture:Batch Layer + Speed Layer + Serving Layer
Hive (SQL on Hadoop)
Pig (Data Flow Language)
Apache Kafka (Streaming)
Data Warehousing
ETL Processes
Data Visualization
Real-Time Analytics
Previous Year Questions
Unit 12023 · End Semester10 marks
Write MapReduce pseudocode for Word Count problem. Given input: 'hello world hello'. Show Map output, Shuffle phase, and Reduce output step-by-step.
Unit 22023 · End Semester8 marks
Explain CAP theorem with examples. For an e-commerce site, would you prioritize CA, CP, or AP? Justify. Compare MongoDB and Cassandra.
Unit 32022 · End Semester6 marks
What are RDDs in Spark? Explain transformations vs actions with examples. Why is Spark faster than MapReduce?
Exam Strategy
🗺️
MapReduce examples
Word count, average calculation, max value — practice 5 problems. Show Map output (key-value pairs), Shuffle phase, Reduce output. Tabular format helps.
🎯
CAP theorem is gold
CAP theorem + ACID vs BASE comparison appears in EVERY exam. Make a comparison table. Give examples: MongoDB (CP), Cassandra (AP).
Spark vs Hadoop
Why Spark is faster (in-memory vs disk). RDD lineage for fault tolerance. Lazy evaluation concept. Always asked in exams.
Related Subjects