#1 Data Analytics Program in India
₹2,499₹1,499Enroll Now
12 min read

Pandas I: DataFrames

Mastering the DataFrame: Creation, inspection, selection, and filtering

What You'll Learn

  • Creating and inspecting DataFrames
  • Selecting columns and rows (loc/iloc)
  • Conditional filtering
  • Sorting and ranking
  • Basic DataFrame attributes

The DataFrame Object

A DataFrame is a 2-dimensional labeled data structure with columns of potentially different types. Think of it like a super-powered Excel sheet or SQL table.

code.py
import pandas as pd

# Creating a DataFrame
data = {
    'Name': ['Alice', 'Bob', 'Charlie'],
    'Age': [25, 30, 35],
    'City': ['New York', 'Paris', 'London']
}
df = pd.DataFrame(data)

Inspecting Data

code.py
# View first/last rows
df.head(5)
df.tail(5)

# Summary info (types, non-null counts)
df.info()

# Statistical summary (mean, min, max, etc.)
df.describe()

# Dimensions
df.shape  # (rows, columns)
df.columns # Column names
df.index   # Row labels

Selection and Indexing

Selecting Columns:

code.py
# Single column (returns Series)
ages = df['Age']

# Multiple columns (returns DataFrame)
subset = df[['Name', 'City']]

Selecting Rows (loc vs iloc):

  • .loc: Label-based
  • .iloc: Integer position-based
code.py
# Select row by index label
row = df.loc[0]

# Select specific rows and columns by label
val = df.loc[0, 'Name'] # 'Alice'
subset = df.loc[0:2, ['Name', 'Age']]

# Select by position (integer index)
row = df.iloc[0]      # First row
subset = df.iloc[0:2, 0:2] # First 2 rows, first 2 cols

Filtering Data

code.py
# Simple condition
adults = df[df['Age'] >= 18]

# Multiple conditions (& for AND, | for OR)
# Note: Parentheses are mandatory!
target = df[(df['Age'] > 25) & (df['City'] == 'Paris')]

# Isin (like SQL IN)
cities = df[df['City'].isin(['New York', 'London'])]

# String filtering
j_names = df[df['Name'].str.startswith('J')]

Sorting and Ranking

code.py
# Sort by values
df_sorted = df.sort_values(by='Age', ascending=False)

# Sort by multiple columns
df_sorted = df.sort_values(by=['City', 'Age'])

# Sort by index
df_sorted = df.sort_index()

Modifying Data

code.py
# Adding a new column
df['Age_Next_Year'] = df['Age'] + 1

# Conditional assignment
import numpy as np
df['Status'] = np.where(df['Age'] >= 18, 'Adult', 'Minor')

# Dropping columns
df = df.drop(columns=['Age_Next_Year'])

Practice Exercise

code.py
import pandas as pd

# Load sample data (or create it)
df = pd.DataFrame({
    'product': ['Laptop', 'Mouse', 'Monitor', 'Keyboard'],
    'price': [1000, 25, 200, 50],
    'stock': [10, 100, 20, 50]
})

# 1. Select products with price > 100
expensive = df[df['price'] > 100]

# 2. Calculate total value (price * stock)
df['total_value'] = df['price'] * df['stock']

# 3. Sort by total value descending
df_sorted = df.sort_values('total_value', ascending=False)

print(df_sorted)

Next Steps

Now that you can manipulate a single DataFrame, let's learn how to combine multiple datasets!

Practice & Experiment

Test your understanding by running Python code directly in your browser. Try the examples from the article above!

SkillsetMaster - AI, Web Development & Data Analytics Courses