Why Python for Data Analysts?
Python is the #2 most important skill after SQL. Here's why:
โ Automate repetitive tasks (no more manual copy-paste!) โ Handle large datasets (millions of rows, no problem) โ Advanced analytics (statistics, ML, forecasting) โ Web scraping (collect data from websites) โ Free & popular (huge community, tons of libraries)
Real fact: 75% of data analyst job postings require Python. It's non-negotiable in 2026.
In this topic, you'll learn:
- Variables and data types
- Lists, dictionaries, and tuples
- Loops and conditionals
- Functions and modules
- String manipulation
- File operations
- Real-world data analysis examples
Setting Up Python
Option 1: Anaconda (Recommended for Analysts)
Download from anaconda.com
Why Anaconda?
- Includes Python + data libraries (Pandas, NumPy, Matplotlib)
- Jupyter Notebook for interactive analysis
- No separate installations needed
Option 2: Python.org
Download from python.org
Then install libraries:
pip install pandas numpy matplotlib jupyterVerify Installation
python --version # Should show Python 3.10+Variables & Data Types
Variables
Variables store data. No need to declare types โ Python figures it out.
# Creating variables
name = "Rahul"
age = 25
salary = 50000.50
is_employed = True
# Print variables
print(name) # Rahul
print(f"Age: {age}") # Age: 25Data Types
| Type | Example | Used For | Check Type |
|------|---------|----------|------------|
| str | "Hello" | Text | type("Hello") |
| int | 25 | Whole numbers | type(25) |
| float | 3.14 | Decimals | type(3.14) |
| bool | True/False | Yes/No flags | type(True) |
Type Conversion
# String to integer
age_str = "25"
age_int = int(age_str)
# Integer to string
count = 100
count_str = str(count)
# String to float
price = float("99.99")
# Check type
print(type(age_int)) # <class 'int'>String Operations
Strings are everywhere in data analysis โ cleaning names, extracting info, formatting output.
Basic Operations
company = "Skillsetmaster"
# Length
print(len(company)) # 14
# Uppercase / Lowercase
print(company.upper()) # SKILLSETMASTER
print(company.lower()) # skillsetmaster
# Check if contains
print("skill" in company.lower()) # TrueString Methods Analysts Use Daily
# Strip whitespace
name = " Rahul "
print(name.strip()) # "Rahul"
# Replace
email = "user@gmail.com"
cleaned = email.replace("gmail", "company")
print(cleaned) # user@company.com
# Split
csv_row = "Rahul,Mumbai,50000"
fields = csv_row.split(",")
print(fields) # ['Rahul', 'Mumbai', '50000']
# Join
cities = ["Mumbai", "Delhi", "Bangalore"]
result = ", ".join(cities)
print(result) # "Mumbai, Delhi, Bangalore"String Formatting (F-Strings)
name = "Priya"
revenue = 125000
# Old way (avoid)
print("Name: " + name + ", Revenue: " + str(revenue))
# Modern way (use this!)
print(f"Name: {name}, Revenue: โน{revenue:,}")
# Output: Name: Priya, Revenue: โน125,000Lists โ Ordered Collections
Lists store multiple items in order.
# List of sales
sales = [100, 200, 150, 300]
# Access by index (starts at 0)
print(sales[0]) # 100 (first)
print(sales[-1]) # 300 (last)
# Slicing
print(sales[1:3]) # [200, 150] (index 1 to 2)
# Length
print(len(sales)) # 4Modifying Lists
sales = [100, 200, 150]
# Add item
sales.append(250) # [100, 200, 150, 250]
# Insert at position
sales.insert(1, 175) # [100, 175, 200, 150, 250]
# Remove item
sales.remove(150) # [100, 175, 200, 250]
# Remove by index
del sales[0] # [175, 200, 250]
# Pop (remove and return last)
last = sales.pop() # last = 250, sales = [175, 200]List Operations Analysts Use
sales = [1200, 1500, 980, 1350, 1100]
# Sum
total = sum(sales) # 6130
# Average
average = sum(sales) / len(sales) # 1226.0
# Min / Max
print(min(sales)) # 980
print(max(sales)) # 1500
# Sort
sales.sort() # [980, 1100, 1200, 1350, 1500]
sales.sort(reverse=True) # [1500, 1350, 1200, 1100, 980]
# Count occurrences
print(sales.count(1200)) # 1List Comprehensions (Advanced)
Create lists in one line:
# Traditional way
squared = []
for x in [1, 2, 3, 4, 5]:
squared.append(x ** 2)
# List comprehension (Pythonic!)
squared = [x ** 2 for x in [1, 2, 3, 4, 5]]
print(squared) # [1, 4, 9, 16, 25]
# With condition
even_squared = [x ** 2 for x in range(10) if x % 2 == 0]
print(even_squared) # [0, 4, 16, 36, 64]Dictionaries โ Key-Value Pairs
Dictionaries store data as key-value pairs (like JSON).
customer = {
"name": "Priya",
"age": 28,
"city": "Mumbai",
"premium": True,
"orders": 15
}
# Access values
print(customer["name"]) # Priya
print(customer.get("age")) # 28
# Add new key
customer["email"] = "priya@example.com"
# Update value
customer["orders"] = 16
# Check if key exists
if "premium" in customer:
print("Premium customer!")Dictionary Methods
customer = {"name": "Rahul", "age": 25, "city": "Delhi"}
# Get all keys
print(customer.keys()) # dict_keys(['name', 'age', 'city'])
# Get all values
print(customer.values()) # dict_values(['Rahul', 25, 'Delhi'])
# Get key-value pairs
print(customer.items())
# dict_items([('name', 'Rahul'), ('age', 25), ('city', 'Delhi')])
# Loop through dictionary
for key, value in customer.items():
print(f"{key}: {value}")Real Example: Sales by Region
sales_by_region = {
"North": 500000,
"South": 350000,
"East": 280000,
"West": 420000
}
# Total sales
total = sum(sales_by_region.values())
print(f"Total: โน{total:,}") # Total: โน1,550,000
# Top region
top_region = max(sales_by_region, key=sales_by_region.get)
print(f"Top: {top_region}") # Top: NorthTuples โ Immutable Lists
Tuples are like lists but cannot be changed after creation.
# Create tuple
coordinates = (19.0760, 72.8777) # Mumbai lat, lon
# Access
print(coordinates[0]) # 19.0760
# Can't modify (this will error)
# coordinates[0] = 20 # TypeError!
# Use case: returning multiple values from function
def get_stats(numbers):
return min(numbers), max(numbers), sum(numbers)
min_val, max_val, total = get_stats([10, 20, 30])
print(min_val, max_val, total) # 10 30 60Loops โ Automate Repetition
For Loop
# Loop through list
sales = [100, 200, 150]
for sale in sales:
print(f"Sale amount: โน{sale}")
# Loop with index
for i, sale in enumerate(sales):
print(f"Sale {i+1}: โน{sale}")
# Output:
# Sale 1: โน100
# Sale 2: โน200
# Sale 3: โน150
# Loop through range
for i in range(5):
print(i) # 0, 1, 2, 3, 4
# Loop through dictionary
customer = {"name": "Rahul", "age": 25}
for key, value in customer.items():
print(f"{key}: {value}")While Loop
count = 0
while count < 5:
print(count)
count += 1
# Real example: process until condition met
revenue = 0
day = 1
while revenue < 100000:
daily_sales = 15000 # Simplified
revenue += daily_sales
print(f"Day {day}: Total โน{revenue:,}")
day += 1Loop Control
# break - exit loop
for i in range(10):
if i == 5:
break
print(i) # 0, 1, 2, 3, 4
# continue - skip iteration
for i in range(5):
if i == 2:
continue
print(i) # 0, 1, 3, 4Conditional Logic
revenue = 120000
if revenue > 100000:
print("High performer")
elif revenue > 50000:
print("Good performer")
else:
print("Needs improvement")Comparison Operators
# == equal
# != not equal
# > greater than
# < less than
# >= greater or equal
# <= less or equal
age = 25
if age >= 18:
print("Adult")
# Multiple conditions
revenue = 80000
orders = 50
if revenue > 50000 and orders > 40:
print("Top customer")
if revenue < 10000 or orders == 0:
print("At-risk customer")Ternary Operator (One-Liner)
# Traditional
if revenue > 100000:
tier = "Premium"
else:
tier = "Standard"
# Ternary (concise)
tier = "Premium" if revenue > 100000 else "Standard"Functions โ Reusable Code
Functions prevent repeating code.
def calculate_gst(amount):
return amount * 1.18
total = calculate_gst(1000)
print(total) # 1180.0Functions with Multiple Parameters
def calculate_discount(price, discount_pct):
discount_amount = price * (discount_pct / 100)
final_price = price - discount_amount
return final_price
final = calculate_discount(1000, 20)
print(final) # 800.0Default Parameters
def greet(name, greeting="Hello"):
return f"{greeting}, {name}!"
print(greet("Rahul")) # Hello, Rahul!
print(greet("Priya", "Namaste")) # Namaste, Priya!Real Example: Sales Analysis Function
def analyze_sales(sales_list):
total = sum(sales_list)
average = total / len(sales_list)
highest = max(sales_list)
lowest = min(sales_list)
return {
"total": total,
"average": average,
"highest": highest,
"lowest": lowest
}
sales = [1200, 1500, 980, 1350, 1100]
stats = analyze_sales(sales)
print(f"Total: โน{stats['total']:,}")
print(f"Average: โน{stats['average']:.2f}")
print(f"Highest: โน{stats['highest']}")
print(f"Lowest: โน{stats['lowest']}")Working with Files
Reading Files
# Read entire file
with open('data.txt', 'r') as file:
content = file.read()
print(content)
# Read line by line
with open('data.txt', 'r') as file:
for line in file:
print(line.strip())Writing Files
# Write (overwrites file)
with open('output.txt', 'w') as file:
file.write("Total revenue: โน500,000\n")
file.write("Orders: 250\n")
# Append (adds to end)
with open('output.txt', 'a') as file:
file.write("New data\n")Real Example: Process CSV
with open('sales.csv', 'r') as file:
header = file.readline().strip() # Skip header
total_sales = 0
for line in file:
fields = line.strip().split(',')
amount = float(fields[2]) # Assuming amount is 3rd column
total_sales += amount
print(f"Total sales: โน{total_sales:,}")Importing Modules
Python's power comes from libraries.
# Import entire module
import math
print(math.sqrt(16)) # 4.0
# Import specific function
from math import sqrt, ceil
print(sqrt(16)) # 4.0
print(ceil(3.2)) # 4
# Import with alias
import pandas as pd
import numpy as np
# Common modules for analysts
import datetime
import random
import csvError Handling
Handle errors gracefully instead of crashing.
# Without error handling (will crash if error)
# age = int(input("Enter age: "))
# With error handling
try:
age = int(input("Enter age: "))
print(f"Age: {age}")
except ValueError:
print("Please enter a valid number!")
# Real example: Reading file
try:
with open('data.txt', 'r') as file:
content = file.read()
except FileNotFoundError:
print("File not found!")
except Exception as e:
print(f"Error: {e}")Real-World Example: Sales Report Generator
def generate_sales_report(sales_data):
"""
Generate sales report from daily sales data
Args:
sales_data: list of dictionaries with 'date' and 'amount'
Returns:
Dictionary with summary statistics
"""
if not sales_data:
return {"error": "No data provided"}
# Calculate metrics
amounts = [sale['amount'] for sale in sales_data]
total = sum(amounts)
average = total / len(amounts)
highest_day = max(sales_data, key=lambda x: x['amount'])
lowest_day = min(sales_data, key=lambda x: x['amount'])
# Count days above average
above_avg = sum(1 for amt in amounts if amt > average)
report = {
"total_revenue": total,
"average_daily": average,
"highest_day": highest_day,
"lowest_day": lowest_day,
"days_above_average": above_avg,
"total_days": len(sales_data)
}
return report
# Sample data
sales_data = [
{"date": "2026-03-01", "amount": 12000},
{"date": "2026-03-02", "amount": 15000},
{"date": "2026-03-03", "amount": 9000},
{"date": "2026-03-04", "amount": 18000},
{"date": "2026-03-05", "amount": 11000}
]
# Generate report
report = generate_sales_report(sales_data)
# Print formatted report
print("=" * 40)
print("SALES REPORT")
print("=" * 40)
print(f"Total Revenue: โน{report['total_revenue']:,}")
print(f"Average Daily: โน{report['average_daily']:,.2f}")
print(f"Highest Day: {report['highest_day']['date']} (โน{report['highest_day']['amount']:,})")
print(f"Lowest Day: {report['lowest_day']['date']} (โน{report['lowest_day']['amount']:,})")
print(f"Days Above Average: {report['days_above_average']}/{report['total_days']}")
print("=" * 40)Common Mistakes & Fixes
โ Mistake 1: Forgetting Indentation
Python uses indentation for code blocks.
# Wrong
if revenue > 100000:
print("High") # IndentationError!
# Correct
if revenue > 100000:
print("High") # 4 spaces or 1 tabโ Mistake 2: Using = Instead of ==
# Wrong (assignment, not comparison)
if revenue = 100000: # SyntaxError!
# Correct
if revenue == 100000:
print("Exactly 100k")โ Mistake 3: Modifying List While Looping
# Wrong (can skip items)
sales = [100, 200, 150, 300]
for sale in sales:
if sale < 200:
sales.remove(sale) # Don't do this!
# Correct
sales = [s for s in sales if s >= 200]โ Mistake 4: Not Closing Files
# Wrong (file stays open)
file = open('data.txt', 'r')
content = file.read()
# ... forgot file.close()
# Correct (auto-closes)
with open('data.txt', 'r') as file:
content = file.read()
# File automatically closed hereBest Practices for Analysts
-
Use meaningful variable names
code.pyPython# Bad x = 50000 y = 12 # Good monthly_revenue = 50000 num_orders = 12 -
Comment your code
code.pyPython# Calculate 20% discount for premium customers discount = price * 0.20 -
Use f-strings for formatting
code.pyPython# Bad print("Revenue: " + str(revenue)) # Good print(f"Revenue: โน{revenue:,}") -
Keep functions small and focused
- One function = one task
- If > 20 lines, consider splitting
-
Handle errors gracefully
- Use try/except for file operations
- Validate user input
Summary
โ
Variables & types - int, float, str, bool
โ
Strings - strip(), split(), replace(), f-strings
โ
Lists - append(), sort(), sum(), list comprehensions
โ
Dictionaries - key-value storage for structured data
โ
Loops - for, while, enumerate(), break, continue
โ
Conditionals - if/elif/else, comparison operators
โ
Functions - reusable code, parameters, return values
โ
Files - read/write with with open()
โ
Modules - import libraries for advanced features
โ
Error handling - try/except for robust code
Next Topic: Pandas for data manipulation! ๐ผ
You now have Python fundamentals. In the next topic, you'll use Pandas to analyze real datasets with millions of rows!