#1 Data Analytics Program in India
₹2,499₹1,499Enroll Now
5 min read min read

Introduction to Pandas

Learn what Pandas is and why it is essential for data analysis

Introduction to Pandas

What is Pandas?

Pandas is Python's most popular library for working with data. Think of it like Excel, but more powerful and automated.

The name comes from "Panel Data" - a term for multi-dimensional data.

Why Pandas is essential:

  • Works with tables of data (like spreadsheets)
  • Clean and analyze data easily
  • Read/write CSV, Excel, SQL files
  • Standard tool in data science

What Makes Pandas Special?

Built on NumPy:

  • Fast and efficient
  • Handles millions of rows
  • Less memory than pure Python

Easy to use:

  • Simple, readable syntax
  • Works like spreadsheet formulas
  • Intuitive for beginners

Powerful features:

  • Filter and sort data
  • Group and summarize
  • Handle missing values
  • Merge datasets

Installing Pandas

pip install pandas

Check installation:

code.py
import pandas as pd
print(pd.__version__)

Pandas Data Structures

Pandas has two main structures:

Series (1D)

A Series is like a single column of data.

code.py
import pandas as pd

prices = pd.Series([100, 200, 300])
print(prices)

Output:

0 100 1 200 2 300 dtype: int64

Think of it as one column from a spreadsheet.

DataFrame (2D)

A DataFrame is like a full spreadsheet with rows and columns.

code.py
import pandas as pd

data = {
    'Product': ['Laptop', 'Phone', 'Tablet'],
    'Price': [999, 599, 399]
}

df = pd.DataFrame(data)
print(df)

Output:

Product Price 0 Laptop 999 1 Phone 599 2 Tablet 399

This is what you'll use most of the time.

Why Use Pandas Instead of Lists?

Python lists:

code.py
names = ['John', 'Sarah', 'Mike']
ages = [25, 30, 28]
cities = ['NYC', 'LA', 'Chicago']

# Hard to work with related data
# Need multiple lists
# No built-in analysis tools

Pandas DataFrame:

code.py
import pandas as pd

df = pd.DataFrame({
    'Name': ['John', 'Sarah', 'Mike'],
    'Age': [25, 30, 28],
    'City': ['NYC', 'LA', 'Chicago']
})

# All data organized together
# Easy to filter, sort, analyze
# Powerful built-in functions

Real-World Example

The scenario: Analyze sales data.

code.py
import pandas as pd

sales = pd.DataFrame({
    'Date': ['2024-01-01', '2024-01-02', '2024-01-03'],
    'Product': ['Laptop', 'Phone', 'Tablet'],
    'Quantity': [5, 10, 7],
    'Price': [999, 599, 399]
})

print(sales)
print()

sales['Total'] = sales['Quantity'] * sales['Price']
print("With totals:")
print(sales)
print()

print("Total revenue:", sales['Total'].sum())
print("Average price:", sales['Price'].mean())
print("Best selling:", sales.loc[sales['Quantity'].idxmax(), 'Product'])

What this does:

  1. Creates sales data table
  2. Calculates total for each row
  3. Shows total revenue
  4. Calculates average price
  5. Finds best-selling product

All in just a few lines!

Common Pandas Operations

Reading Data

code.py
import pandas as pd

df = pd.read_csv('data.csv')
df = pd.read_excel('data.xlsx')
df = pd.read_sql(query, connection)

Quick Look

code.py
print(df.head())
print(df.info())
print(df.describe())

Filtering

code.py
expensive = df[df['Price'] > 500]

Sorting

code.py
sorted_df = df.sort_values('Price')

Grouping

code.py
by_category = df.groupby('Category').sum()

Pandas vs Excel

FeatureExcelPandas
Data sizeLimited rowsMillions of rows
SpeedSlow with big dataVery fast
AutomationManual clicksWrite once, run always
ReproducibilityHard to track changesCode documents everything
Advanced analysisLimitedUnlimited

Use Excel for:

  • Quick manual tasks
  • Sharing with non-programmers
  • Simple data entry

Use Pandas for:

  • Large datasets
  • Repetitive tasks
  • Complex analysis
  • Automated reports

When to Use Pandas

Perfect for:

  • Analyzing CSV/Excel files
  • Cleaning messy data
  • Combining multiple datasets
  • Statistical analysis
  • Preparing data for machine learning

Examples:

  • Sales analysis
  • Survey results
  • Financial data
  • Scientific experiments
  • Web scraping results

Import Convention

Always import Pandas as "pd":

code.py
import pandas as pd

Why:

  • Shorter to type
  • Standard convention
  • Everyone does this

Key Points to Remember

Pandas is Python's main library for data analysis. Built on NumPy for speed.

DataFrame is the primary structure - like a spreadsheet with rows and columns.

Series is a single column. DataFrames are collections of Series.

Pandas can read CSV, Excel, SQL, and many other formats easily.

Much more powerful than Excel for large datasets and automation.

Common Mistakes

Mistake 1: Not importing

code.py
df = DataFrame()  # Error! No DataFrame without import

Fix:

code.py
import pandas as pd
df = pd.DataFrame()

Mistake 2: Wrong import name

code.py
import pandas
df = pd.DataFrame()  # Error! Use pandas or import as pd

Mistake 3: Using lists when DataFrame is better

code.py
names = []
ages = []
# Hard to manage related data

Better:

code.py
df = pd.DataFrame({'Name': names, 'Age': ages})

Mistake 4: Not checking data first

code.py
df = pd.read_csv('data.csv')
# Process without looking

Always check:

code.py
print(df.head())
print(df.info())

What's Next?

You now understand what Pandas is and why it's important. Next, you'll learn about creating DataFrames - different ways to build DataFrames from various data sources.