Topic 6 of 12

Python Basics for Analysts

Python is the Swiss Army knife of data analysis. Learn the basics and unlock automation, advanced analytics, and ML.

๐Ÿ“šBeginner
โฑ๏ธ16 min
โœ…5 quizzes

Why Python for Data Analysts?

Python is the #2 most important skill after SQL. Here's why:

โœ… Automate repetitive tasks (no more manual copy-paste!) โœ… Handle large datasets (millions of rows, no problem) โœ… Advanced analytics (statistics, ML, forecasting) โœ… Web scraping (collect data from websites) โœ… Free & popular (huge community, tons of libraries)

Real fact: 75% of data analyst job postings require Python. It's non-negotiable in 2026.

In this topic, you'll learn:

  • Variables and data types
  • Lists, dictionaries, and tuples
  • Loops and conditionals
  • Functions and modules
  • String manipulation
  • File operations
  • Real-world data analysis examples

Setting Up Python

Option 1: Anaconda (Recommended for Analysts)

Download from anaconda.com

Why Anaconda?

  • Includes Python + data libraries (Pandas, NumPy, Matplotlib)
  • Jupyter Notebook for interactive analysis
  • No separate installations needed

Option 2: Python.org

Download from python.org

Then install libraries:

$ terminalBash
pip install pandas numpy matplotlib jupyter

Verify Installation

code.pyPython
python --version  # Should show Python 3.10+

Variables & Data Types

Variables

Variables store data. No need to declare types โ€” Python figures it out.

code.pyPython
# Creating variables
name = "Rahul"
age = 25
salary = 50000.50
is_employed = True

# Print variables
print(name)        # Rahul
print(f"Age: {age}")  # Age: 25

Data Types

| Type | Example | Used For | Check Type | |------|---------|----------|------------| | str | "Hello" | Text | type("Hello") | | int | 25 | Whole numbers | type(25) | | float | 3.14 | Decimals | type(3.14) | | bool | True/False | Yes/No flags | type(True) |

Type Conversion

code.pyPython
# String to integer
age_str = "25"
age_int = int(age_str)

# Integer to string
count = 100
count_str = str(count)

# String to float
price = float("99.99")

# Check type
print(type(age_int))  # <class 'int'>

String Operations

Strings are everywhere in data analysis โ€” cleaning names, extracting info, formatting output.

Basic Operations

code.pyPython
company = "Skillsetmaster"

# Length
print(len(company))  # 14

# Uppercase / Lowercase
print(company.upper())  # SKILLSETMASTER
print(company.lower())  # skillsetmaster

# Check if contains
print("skill" in company.lower())  # True

String Methods Analysts Use Daily

code.pyPython
# Strip whitespace
name = "  Rahul  "
print(name.strip())  # "Rahul"

# Replace
email = "user@gmail.com"
cleaned = email.replace("gmail", "company")
print(cleaned)  # user@company.com

# Split
csv_row = "Rahul,Mumbai,50000"
fields = csv_row.split(",")
print(fields)  # ['Rahul', 'Mumbai', '50000']

# Join
cities = ["Mumbai", "Delhi", "Bangalore"]
result = ", ".join(cities)
print(result)  # "Mumbai, Delhi, Bangalore"

String Formatting (F-Strings)

code.pyPython
name = "Priya"
revenue = 125000

# Old way (avoid)
print("Name: " + name + ", Revenue: " + str(revenue))

# Modern way (use this!)
print(f"Name: {name}, Revenue: โ‚น{revenue:,}")
# Output: Name: Priya, Revenue: โ‚น125,000

Lists โ€” Ordered Collections

Lists store multiple items in order.

code.pyPython
# List of sales
sales = [100, 200, 150, 300]

# Access by index (starts at 0)
print(sales[0])   # 100 (first)
print(sales[-1])  # 300 (last)

# Slicing
print(sales[1:3])  # [200, 150] (index 1 to 2)

# Length
print(len(sales))  # 4

Modifying Lists

code.pyPython
sales = [100, 200, 150]

# Add item
sales.append(250)         # [100, 200, 150, 250]

# Insert at position
sales.insert(1, 175)      # [100, 175, 200, 150, 250]

# Remove item
sales.remove(150)         # [100, 175, 200, 250]

# Remove by index
del sales[0]              # [175, 200, 250]

# Pop (remove and return last)
last = sales.pop()        # last = 250, sales = [175, 200]

List Operations Analysts Use

code.pyPython
sales = [1200, 1500, 980, 1350, 1100]

# Sum
total = sum(sales)        # 6130

# Average
average = sum(sales) / len(sales)  # 1226.0

# Min / Max
print(min(sales))  # 980
print(max(sales))  # 1500

# Sort
sales.sort()       # [980, 1100, 1200, 1350, 1500]
sales.sort(reverse=True)  # [1500, 1350, 1200, 1100, 980]

# Count occurrences
print(sales.count(1200))  # 1

List Comprehensions (Advanced)

Create lists in one line:

code.pyPython
# Traditional way
squared = []
for x in [1, 2, 3, 4, 5]:
    squared.append(x ** 2)

# List comprehension (Pythonic!)
squared = [x ** 2 for x in [1, 2, 3, 4, 5]]
print(squared)  # [1, 4, 9, 16, 25]

# With condition
even_squared = [x ** 2 for x in range(10) if x % 2 == 0]
print(even_squared)  # [0, 4, 16, 36, 64]

Dictionaries โ€” Key-Value Pairs

Dictionaries store data as key-value pairs (like JSON).

code.pyPython
customer = {
    "name": "Priya",
    "age": 28,
    "city": "Mumbai",
    "premium": True,
    "orders": 15
}

# Access values
print(customer["name"])     # Priya
print(customer.get("age"))  # 28

# Add new key
customer["email"] = "priya@example.com"

# Update value
customer["orders"] = 16

# Check if key exists
if "premium" in customer:
    print("Premium customer!")

Dictionary Methods

code.pyPython
customer = {"name": "Rahul", "age": 25, "city": "Delhi"}

# Get all keys
print(customer.keys())    # dict_keys(['name', 'age', 'city'])

# Get all values
print(customer.values())  # dict_values(['Rahul', 25, 'Delhi'])

# Get key-value pairs
print(customer.items())
# dict_items([('name', 'Rahul'), ('age', 25), ('city', 'Delhi')])

# Loop through dictionary
for key, value in customer.items():
    print(f"{key}: {value}")

Real Example: Sales by Region

code.pyPython
sales_by_region = {
    "North": 500000,
    "South": 350000,
    "East": 280000,
    "West": 420000
}

# Total sales
total = sum(sales_by_region.values())
print(f"Total: โ‚น{total:,}")  # Total: โ‚น1,550,000

# Top region
top_region = max(sales_by_region, key=sales_by_region.get)
print(f"Top: {top_region}")  # Top: North

Tuples โ€” Immutable Lists

Tuples are like lists but cannot be changed after creation.

code.pyPython
# Create tuple
coordinates = (19.0760, 72.8777)  # Mumbai lat, lon

# Access
print(coordinates[0])  # 19.0760

# Can't modify (this will error)
# coordinates[0] = 20  # TypeError!

# Use case: returning multiple values from function
def get_stats(numbers):
    return min(numbers), max(numbers), sum(numbers)

min_val, max_val, total = get_stats([10, 20, 30])
print(min_val, max_val, total)  # 10 30 60

Loops โ€” Automate Repetition

For Loop

code.pyPython
# Loop through list
sales = [100, 200, 150]
for sale in sales:
    print(f"Sale amount: โ‚น{sale}")

# Loop with index
for i, sale in enumerate(sales):
    print(f"Sale {i+1}: โ‚น{sale}")
# Output:
# Sale 1: โ‚น100
# Sale 2: โ‚น200
# Sale 3: โ‚น150

# Loop through range
for i in range(5):
    print(i)  # 0, 1, 2, 3, 4

# Loop through dictionary
customer = {"name": "Rahul", "age": 25}
for key, value in customer.items():
    print(f"{key}: {value}")

While Loop

code.pyPython
count = 0
while count < 5:
    print(count)
    count += 1

# Real example: process until condition met
revenue = 0
day = 1
while revenue < 100000:
    daily_sales = 15000  # Simplified
    revenue += daily_sales
    print(f"Day {day}: Total โ‚น{revenue:,}")
    day += 1

Loop Control

code.pyPython
# break - exit loop
for i in range(10):
    if i == 5:
        break
    print(i)  # 0, 1, 2, 3, 4

# continue - skip iteration
for i in range(5):
    if i == 2:
        continue
    print(i)  # 0, 1, 3, 4

Conditional Logic

code.pyPython
revenue = 120000

if revenue > 100000:
    print("High performer")
elif revenue > 50000:
    print("Good performer")
else:
    print("Needs improvement")

Comparison Operators

code.pyPython
# == equal
# != not equal
# >  greater than
# <  less than
# >= greater or equal
# <= less or equal

age = 25
if age >= 18:
    print("Adult")

# Multiple conditions
revenue = 80000
orders = 50

if revenue > 50000 and orders > 40:
    print("Top customer")

if revenue < 10000 or orders == 0:
    print("At-risk customer")

Ternary Operator (One-Liner)

code.pyPython
# Traditional
if revenue > 100000:
    tier = "Premium"
else:
    tier = "Standard"

# Ternary (concise)
tier = "Premium" if revenue > 100000 else "Standard"

Functions โ€” Reusable Code

Functions prevent repeating code.

code.pyPython
def calculate_gst(amount):
    return amount * 1.18

total = calculate_gst(1000)
print(total)  # 1180.0

Functions with Multiple Parameters

code.pyPython
def calculate_discount(price, discount_pct):
    discount_amount = price * (discount_pct / 100)
    final_price = price - discount_amount
    return final_price

final = calculate_discount(1000, 20)
print(final)  # 800.0

Default Parameters

code.pyPython
def greet(name, greeting="Hello"):
    return f"{greeting}, {name}!"

print(greet("Rahul"))              # Hello, Rahul!
print(greet("Priya", "Namaste"))   # Namaste, Priya!

Real Example: Sales Analysis Function

code.pyPython
def analyze_sales(sales_list):
    total = sum(sales_list)
    average = total / len(sales_list)
    highest = max(sales_list)
    lowest = min(sales_list)

    return {
        "total": total,
        "average": average,
        "highest": highest,
        "lowest": lowest
    }

sales = [1200, 1500, 980, 1350, 1100]
stats = analyze_sales(sales)

print(f"Total: โ‚น{stats['total']:,}")
print(f"Average: โ‚น{stats['average']:.2f}")
print(f"Highest: โ‚น{stats['highest']}")
print(f"Lowest: โ‚น{stats['lowest']}")

Working with Files

Reading Files

code.pyPython
# Read entire file
with open('data.txt', 'r') as file:
    content = file.read()
    print(content)

# Read line by line
with open('data.txt', 'r') as file:
    for line in file:
        print(line.strip())

Writing Files

code.pyPython
# Write (overwrites file)
with open('output.txt', 'w') as file:
    file.write("Total revenue: โ‚น500,000\n")
    file.write("Orders: 250\n")

# Append (adds to end)
with open('output.txt', 'a') as file:
    file.write("New data\n")

Real Example: Process CSV

code.pyPython
with open('sales.csv', 'r') as file:
    header = file.readline().strip()  # Skip header

    total_sales = 0
    for line in file:
        fields = line.strip().split(',')
        amount = float(fields[2])  # Assuming amount is 3rd column
        total_sales += amount

    print(f"Total sales: โ‚น{total_sales:,}")

Importing Modules

Python's power comes from libraries.

code.pyPython
# Import entire module
import math
print(math.sqrt(16))  # 4.0

# Import specific function
from math import sqrt, ceil
print(sqrt(16))  # 4.0
print(ceil(3.2))  # 4

# Import with alias
import pandas as pd
import numpy as np

# Common modules for analysts
import datetime
import random
import csv

Error Handling

Handle errors gracefully instead of crashing.

code.pyPython
# Without error handling (will crash if error)
# age = int(input("Enter age: "))

# With error handling
try:
    age = int(input("Enter age: "))
    print(f"Age: {age}")
except ValueError:
    print("Please enter a valid number!")

# Real example: Reading file
try:
    with open('data.txt', 'r') as file:
        content = file.read()
except FileNotFoundError:
    print("File not found!")
except Exception as e:
    print(f"Error: {e}")

Real-World Example: Sales Report Generator

code.pyPython
def generate_sales_report(sales_data):
    """
    Generate sales report from daily sales data

    Args:
        sales_data: list of dictionaries with 'date' and 'amount'

    Returns:
        Dictionary with summary statistics
    """
    if not sales_data:
        return {"error": "No data provided"}

    # Calculate metrics
    amounts = [sale['amount'] for sale in sales_data]
    total = sum(amounts)
    average = total / len(amounts)
    highest_day = max(sales_data, key=lambda x: x['amount'])
    lowest_day = min(sales_data, key=lambda x: x['amount'])

    # Count days above average
    above_avg = sum(1 for amt in amounts if amt > average)

    report = {
        "total_revenue": total,
        "average_daily": average,
        "highest_day": highest_day,
        "lowest_day": lowest_day,
        "days_above_average": above_avg,
        "total_days": len(sales_data)
    }

    return report

# Sample data
sales_data = [
    {"date": "2026-03-01", "amount": 12000},
    {"date": "2026-03-02", "amount": 15000},
    {"date": "2026-03-03", "amount": 9000},
    {"date": "2026-03-04", "amount": 18000},
    {"date": "2026-03-05", "amount": 11000}
]

# Generate report
report = generate_sales_report(sales_data)

# Print formatted report
print("=" * 40)
print("SALES REPORT")
print("=" * 40)
print(f"Total Revenue: โ‚น{report['total_revenue']:,}")
print(f"Average Daily: โ‚น{report['average_daily']:,.2f}")
print(f"Highest Day: {report['highest_day']['date']} (โ‚น{report['highest_day']['amount']:,})")
print(f"Lowest Day: {report['lowest_day']['date']} (โ‚น{report['lowest_day']['amount']:,})")
print(f"Days Above Average: {report['days_above_average']}/{report['total_days']}")
print("=" * 40)

Common Mistakes & Fixes

โŒ Mistake 1: Forgetting Indentation

Python uses indentation for code blocks.

code.pyPython
# Wrong
if revenue > 100000:
print("High")  # IndentationError!

# Correct
if revenue > 100000:
    print("High")  # 4 spaces or 1 tab

โŒ Mistake 2: Using = Instead of ==

code.pyPython
# Wrong (assignment, not comparison)
if revenue = 100000:  # SyntaxError!

# Correct
if revenue == 100000:
    print("Exactly 100k")

โŒ Mistake 3: Modifying List While Looping

code.pyPython
# Wrong (can skip items)
sales = [100, 200, 150, 300]
for sale in sales:
    if sale < 200:
        sales.remove(sale)  # Don't do this!

# Correct
sales = [s for s in sales if s >= 200]

โŒ Mistake 4: Not Closing Files

code.pyPython
# Wrong (file stays open)
file = open('data.txt', 'r')
content = file.read()
# ... forgot file.close()

# Correct (auto-closes)
with open('data.txt', 'r') as file:
    content = file.read()
# File automatically closed here

Best Practices for Analysts

  1. Use meaningful variable names

    code.pyPython
    # Bad
    x = 50000
    y = 12
    
    # Good
    monthly_revenue = 50000
    num_orders = 12
  2. Comment your code

    code.pyPython
    # Calculate 20% discount for premium customers
    discount = price * 0.20
  3. Use f-strings for formatting

    code.pyPython
    # Bad
    print("Revenue: " + str(revenue))
    
    # Good
    print(f"Revenue: โ‚น{revenue:,}")
  4. Keep functions small and focused

    • One function = one task
    • If > 20 lines, consider splitting
  5. Handle errors gracefully

    • Use try/except for file operations
    • Validate user input

Summary

โœ… Variables & types - int, float, str, bool โœ… Strings - strip(), split(), replace(), f-strings โœ… Lists - append(), sort(), sum(), list comprehensions โœ… Dictionaries - key-value storage for structured data โœ… Loops - for, while, enumerate(), break, continue โœ… Conditionals - if/elif/else, comparison operators โœ… Functions - reusable code, parameters, return values โœ… Files - read/write with with open() โœ… Modules - import libraries for advanced features โœ… Error handling - try/except for robust code

Next Topic: Pandas for data manipulation! ๐Ÿผ

You now have Python fundamentals. In the next topic, you'll use Pandas to analyze real datasets with millions of rows!