How to Use This Glossary
Navigation
This glossary is organized into 7 categories:
- Business Metrics — KPIs, revenue metrics, growth metrics
- Analytics Techniques — Methods like cohort analysis, funnel analysis, A/B testing
- Statistical Terms — Mean, median, p-value, correlation, etc.
- Data Engineering — ETL, data warehouse, data lake, pipelines
- Tools & Technologies — SQL, Python, Tableau, BigQuery
- Data Quality — Data cleaning, validation, governance
- Career & Roles — Job titles, responsibilities, skills
Tip: Use Ctrl+F (Cmd+F on Mac) to search for specific terms.
Format
Each entry includes:
- Term: Industry-standard name
- Definition: Plain-English explanation
- Example: Real-world use case
- Learn more: Link to related DataPath tutorial (if available)
Business Metrics (KPIs & Performance Indicators)
Revenue & Growth Metrics
| Term | Definition | Example | |------|------------|---------| | GMV (Gross Merchandise Value) | Total value of goods sold through platform (before deducting costs, discounts, returns). | Flipkart Big Billion Days GMV = ₹19,000 crore (total customer spending, not profit). | | ARR (Annual Recurring Revenue) | Yearly revenue from subscriptions (SaaS metric). | Freshworks ARR = $500M (predictable revenue from customers paying annually). | | MRR (Monthly Recurring Revenue) | Monthly revenue from subscriptions (ARR ÷ 12). | Zoho MRR = $5M → Grows to $5.5M next month (+10% growth). | | Revenue | Total income before expenses (top line). | Swiggy revenue = ₹8,000 crore/year (from delivery fees + restaurant commissions). | | Profit | Revenue minus all costs (bottom line). | Swiggy revenue ₹8,000 Cr - costs ₹8,500 Cr = -₹500 Cr (loss). | | Burn Rate | Rate at which company spends cash (monthly loss). | Startup has ₹10 crore cash, burns ₹2 crore/month → 5 months runway. | | YoY Growth (Year-over-Year) | % change compared to same period last year. | Revenue Q1 2026: ₹100 Cr vs Q1 2025: ₹80 Cr → 25% YoY growth. | | MoM Growth (Month-over-Month) | % change compared to previous month. | Jan revenue: ₹10 Cr, Feb revenue: ₹12 Cr → 20% MoM growth. |
Customer Acquisition Metrics
| Term | Definition | Example | |------|------------|---------| | CAC (Customer Acquisition Cost) | Cost to acquire one customer (marketing spend ÷ new customers). | Spent ₹10L on ads, acquired 1,000 customers → CAC = ₹1,000. Learn more | | LTV (Lifetime Value) | Total revenue expected from customer over their lifetime. | Customer spends ₹500/month for 24 months → LTV = ₹12,000. Learn more | | LTV:CAC Ratio | Lifetime value divided by acquisition cost (profitability indicator). | LTV ₹12,000 ÷ CAC ₹1,000 = 12× (healthy: >3×). | | Conversion Rate | % of visitors who complete desired action (signup, purchase). | 10,000 visitors, 200 purchases → 2% conversion rate. | | CPA (Cost Per Acquisition) | Same as CAC (marketing cost per conversion). | Google Ads spent ₹50K, got 100 signups → CPA = ₹500. | | CPC (Cost Per Click) | Cost paid for each ad click. | Facebook ad: ₹10,000 budget, 5,000 clicks → CPC = ₹2. | | CPM (Cost Per Mille) | Cost per 1,000 ad impressions. | ₹500 for 100,000 impressions → CPM = ₹5. | | CTR (Click-Through Rate) | % of people who clicked ad after seeing it. | 100,000 impressions, 2,000 clicks → CTR = 2%. |
Engagement & Retention Metrics
| Term | Definition | Example | |------|------------|---------| | DAU (Daily Active Users) | Number of unique users active in a day. | Instagram India: 120M DAU (users who open app daily). | | MAU (Monthly Active Users) | Number of unique users active in a month. | Instagram India: 350M MAU (includes daily + weekly users). | | DAU/MAU Ratio | Stickiness (how often monthly users return daily). | DAU 120M ÷ MAU 350M = 34% (users open app 10 days/month on avg). | | Churn Rate | % of customers who stopped using product in a period. | 1,000 customers, 100 cancelled → 10% monthly churn. Learn more | | Retention Rate | % of customers who continue using product (opposite of churn). | 1,000 customers, 900 stayed → 90% retention (10% churn). | | Session Duration | Average time user spends in app per visit. | User opens app 3 times/day for 5 minutes each → 15 min/day session duration. | | Bounce Rate | % of visitors who leave website after viewing only one page. | 10,000 visitors, 6,000 left after homepage → 60% bounce rate. |
E-commerce & Sales Metrics
| Term | Definition | Example | |------|------------|---------| | AOV (Average Order Value) | Average amount spent per order. | ₹50L revenue from 5,000 orders → AOV = ₹1,000. | | Basket Size | Number of items per transaction. | 5,000 orders, 15,000 items → Avg basket size = 3 items/order. | | Cart Abandonment Rate | % of users who added to cart but didn't complete purchase. | 1,000 added to cart, 600 purchased → 40% abandonment. | | SKU (Stock Keeping Unit) | Unique product identifier. | iPhone 15 Pro 256GB Black = 1 SKU, iPhone 15 Pro 256GB White = different SKU. | | Inventory Turnover | How many times inventory is sold/replaced in a period. | ₹10L inventory, ₹60L annual sales → Turnover = 6× (inventory replaced every 2 months). |
Analytics Techniques & Methods
| Term | Definition | Example | Learn More | |------|------------|---------|------------| | Cohort Analysis | Grouping users by shared characteristic (signup date) to track behavior over time. | Jan 2026 cohort: 1,000 users signed up → 250 still active in Jun (25% 6-month retention). | Tutorial | | Funnel Analysis | Tracking user journey through sequential steps to identify drop-off points. | Homepage → Product → Cart → Checkout → Purchase: 100 → 40 → 15 → 10 → 8 users (20% final conversion). | Tutorial | | RFM Analysis | Segmenting customers by Recency, Frequency, Monetary value. | High RFM score = Champions (bought recently, often, high spend). Low RFM = Lost customers. | Tutorial | | A/B Testing | Comparing two versions (A vs B) to see which performs better. | Test: Red button (A) vs Green button (B) → Green has 15% higher conversion → Deploy green. | Tutorial | | Multivariate Testing | Testing multiple variables simultaneously (headline + button color + image). | 3 headlines × 2 button colors × 2 images = 12 combinations tested at once. | — | | Segmentation | Dividing customers into groups based on characteristics (age, location, behavior). | Segment users: Budget buyers (<₹1,000 AOV) vs Premium buyers (>₹5,000 AOV). | — | | Clustering | Using algorithms to automatically group similar data points (unsupervised learning). | K-means clustering: Identify 4 customer types (bargain hunters, brand loyalists, impulse buyers, researchers). | — | | Regression Analysis | Predicting numeric outcome based on input variables. | Predict house price based on size, location, age (Linear regression). | — | | Time Series Analysis | Analyzing data points over time to identify trends, seasonality. | Monthly sales show 20% spike in November (Diwali seasonality). | — | | Root Cause Analysis | Identifying underlying reason for observed problem (5 Whys method). | Problem: Cart abandonment high. Why? Shipping cost. Why? Free shipping threshold too high (₹999 vs competitors ₹499). | — |
Statistical Terms
| Term | Definition | Example | |------|------------|---------| | Mean (Average) | Sum of values divided by count. | Sales: ₹100, ₹200, ₹300 → Mean = ₹200. | | Median | Middle value when data is sorted (50th percentile). | Sales: ₹100, ₹200, ₹10,000 → Median = ₹200 (mean = ₹3,433 skewed by outlier). | | Mode | Most frequent value. | Shoe sizes: 7, 8, 8, 8, 9, 10 → Mode = 8. | | Standard Deviation | Measure of spread (how far values deviate from mean). | Low SD (values clustered), High SD (values spread out). | | Variance | Square of standard deviation (SD²). | SD = 10 → Variance = 100. | | Percentile | Value below which X% of data falls. | 90th percentile salary = ₹25 LPA (90% earn less, 10% earn more). | | Correlation | Measure of relationship between two variables (-1 to +1). | Correlation = 0.8 (strong positive: as X increases, Y increases). | | Causation | One variable directly causes change in another (correlation ≠ causation). | Ice cream sales correlate with drownings (both caused by summer heat, not each other). | | p-value | Probability that result occurred by chance (lower = more significant). | p-value = 0.02 (2% chance result is random → Statistically significant if <0.05). Learn more | | Confidence Interval | Range where true value likely falls (e.g., 95% CI). | Average order value: ₹1,000 ± ₹100 (95% CI: ₹900-₹1,100). Learn more | | Statistical Significance | Result is unlikely due to chance (usually p < 0.05). | A/B test: Treatment 5% better with p=0.03 → Significant (not random). Learn more | | Sample Size | Number of observations in dataset (larger = more reliable). | Survey 100 people (small sample, ±10% error) vs 10,000 people (large sample, ±1% error). | | Outlier | Data point far from others (extreme value). | Salaries: ₹10L, ₹12L, ₹15L, ₹1 Cr → ₹1 Cr is outlier. | | Normal Distribution | Bell curve (most values near mean, few at extremes). | Heights: Most people 5'5"-5'10", few very short (<5') or very tall (>6'5"). Learn more |
Data Engineering & Infrastructure
| Term | Definition | Example | |------|------------|---------| | ETL (Extract, Transform, Load) | Process of moving data from source → warehouse: Extract (pull from database), Transform (clean, aggregate), Load (insert into warehouse). | Daily job: Extract sales from MySQL → Transform (aggregate by product) → Load into BigQuery. Learn more | | ELT (Extract, Load, Transform) | Modern approach: Extract → Load (into cloud warehouse) → Transform (using SQL in warehouse). | Extract from API → Load into Snowflake → Transform with dbt models. | | Data Warehouse | Centralized repository for structured data (optimized for analytics queries). | Snowflake, BigQuery, Redshift (stores cleaned, aggregated data). Learn more | | Data Lake | Storage for raw, unstructured data (logs, images, JSON). | AWS S3, Google Cloud Storage (stores everything as-is, transform later). Learn more | | Data Pipeline | Automated workflow to move/transform data. | Airflow job runs daily: API → S3 → Redshift → Update dashboard. | | Schema | Structure defining tables, columns, data types. | Orders table: order_id (INT), user_id (INT), amount (DECIMAL), date (DATE). | | Normalization | Organizing data to reduce redundancy (multiple tables linked by foreign keys). | Separate Users table and Orders table (linked by user_id) instead of duplicating user data in every order row. | | Denormalization | Combining tables for faster queries (trade-off: data redundancy). | Single Orders_with_Users table (includes user name, email in every order row) for fast lookups. | | Primary Key | Unique identifier for each row in table. | Orders table: order_id is primary key (no two orders have same ID). | | Foreign Key | Column referencing primary key in another table (creates relationship). | Orders table: user_id is foreign key (links to Users table primary key). | | Index | Database structure for fast lookups (like book index). | Create index on email column → Lookup by email is 100× faster. | | OLTP (Online Transaction Processing) | Database optimized for write-heavy operations (app database). | MySQL for e-commerce website (handles orders, user signups in real-time). | | OLAP (Online Analytical Processing) | Database optimized for read-heavy analytics (data warehouse). | BigQuery for running complex aggregations across millions of rows. |
Tools & Technologies
| Term | Definition | Example |
|------|------------|---------|
| SQL (Structured Query Language) | Language for querying relational databases. | SELECT * FROM orders WHERE date > '2026-01-01' — Fetch orders from 2026. Learn SQL |
| Python | General-purpose programming language (popular for data analysis). | Pandas, NumPy, Matplotlib for data manipulation and visualization. Learn Python |
| R | Programming language for statistics and data analysis. | Used in academia and pharma for statistical modeling. |
| Tableau | Data visualization tool (drag-and-drop dashboards). | Create interactive sales dashboard with filters (no coding needed). Learn Tableau |
| Power BI | Microsoft's business intelligence and visualization tool. | Integrates with Excel, Azure SQL (popular in enterprises using Microsoft stack). Learn Power BI |
| Looker (now Looker Studio) | Google's BI tool for creating dashboards and reports. | Connect to BigQuery, create reports with drag-and-drop. Learn Looker |
| Excel | Spreadsheet tool with pivot tables, VLOOKUP, charting. | Analyze small datasets (<100K rows), create pivot table summaries. |
| BigQuery | Google Cloud's serverless data warehouse. | Run SQL queries on petabyte-scale data (pay per query). Learn BigQuery |
| Snowflake | Cloud data warehouse (AWS, Azure, GCP). | Separate compute and storage (scale independently). |
| Redshift | AWS data warehouse (columnar storage, optimized for analytics). | Amazon's e-commerce analytics runs on Redshift. |
| Spark | Distributed computing framework for big data processing. | Process 1TB dataset by splitting across 100 machines (parallel processing). Learn Spark |
| Airflow | Workflow orchestration tool (schedule and monitor data pipelines). | Schedule daily ETL job: Pull data at 2 AM → Transform → Load by 6 AM. |
| dbt (Data Build Tool) | Transform data in warehouse using SQL (version control, testing, documentation). | Write SQL transformations as models → dbt runs them in correct order. Learn dbt |
| Jupyter Notebook | Interactive environment for writing Python code + documentation. | Combine code, visualizations, and markdown explanations in one notebook. |
| Git | Version control system (track code changes, collaborate). | Commit SQL queries to Git → Team can review, merge changes. |
| API (Application Programming Interface) | Way for systems to communicate (request/receive data). | Call Google Maps API to get location data for analysis. |
Data Quality & Governance
| Term | Definition | Example | |------|------------|---------| | Data Cleaning | Fixing errors, removing duplicates, handling missing values. | Remove 500 duplicate customer records, fill missing email addresses with 'N/A'. | | Missing Data | Values that are absent from dataset (NULL, blank, NaN). | 10% of rows have missing phone numbers → Decide: Drop rows, fill with 'Unknown', or impute (use average). | | Duplicate Data | Same record appears multiple times. | Customer 'John Doe' appears 3 times with IDs 101, 102, 103 → Merge into single record. | | Data Validation | Checking data meets rules (email format, date range, positive numbers). | Validate: Email contains '@', Age between 0-120, Order amount > 0. | | Data Lineage | Tracking where data came from and how it was transformed. | Revenue metric sources from: API → S3 → Redshift → Aggregation query → Dashboard. | | Data Governance | Policies for data access, security, quality standards. | Only Finance team can access revenue data (role-based access control). | | Data Dictionary | Document defining what each field means. | 'order_status' field: 1=Pending, 2=Shipped, 3=Delivered, 4=Cancelled. | | Master Data | Single source of truth for key entities (customers, products). | Product master table (authoritative source for product IDs, names, prices). | | PII (Personally Identifiable Information) | Data that identifies individual (email, phone, address). | GDPR compliance: Anonymize PII (replace email with hashed ID before analysis). |
Career & Roles
| Term | Definition | Typical Skills | |------|------------|----------------| | Data Analyst | Analyze data to answer business questions, create dashboards. | SQL, Excel, Tableau/Power BI, basic statistics. | | Business Analyst | Bridge between business and tech (requirements gathering, process improvement). | SQL, Excel, domain knowledge, communication. | | Product Analyst | Analyze product usage, user behavior, A/B tests (support product decisions). | SQL, Python, experimentation, product sense. | | Marketing Analyst | Analyze marketing campaigns, attribution, customer acquisition. | SQL, Google Analytics, Excel, marketing metrics (CAC, LTV, ROAS). | | Data Scientist | Build predictive models, machine learning (forecasting, classification). | Python/R, ML (regression, classification), statistics, SQL. | | Data Engineer | Build data pipelines, manage infrastructure, ETL. | SQL, Python, Spark, Airflow, cloud platforms (AWS, GCP, Azure). | | Analytics Engineer | Hybrid role (transform data in warehouse using SQL/dbt, build data models). | SQL, dbt, data modeling, Python. | | BI Developer | Build dashboards and reports for business stakeholders. | Tableau, Power BI, SQL, data visualization. | | Statistician | Design experiments, perform statistical analysis, hypothesis testing. | Statistics (A/B testing, regression), R/Python, mathematics. |
⚠️ FinalQuiz error: Missing or invalid questions array
⚠️ SummarySection error: Missing or invalid items array
Received: {"hasItems":false,"isArray":false}