#1 Data Analytics Program in India
₹2,499₹1,499Enroll Now
6 min read min read

Inspecting DataFrames

Learn to view and understand DataFrame structure and content

Inspecting DataFrames

Viewing Data

head() - First Rows

code.py
import pandas as pd

df = pd.DataFrame({
    'Name': ['John', 'Sarah', 'Mike', 'Emma', 'David'],
    'Age': [25, 30, 28, 32, 27],
    'City': ['NYC', 'LA', 'Chicago', 'Miami', 'Boston']
})

print(df.head())

Shows first 5 rows by default.

Custom number:

code.py
print(df.head(3))

tail() - Last Rows

code.py
print(df.tail())
print(df.tail(2))

Shows last 5 rows by default.

DataFrame Shape

code.py
import pandas as pd

df = pd.DataFrame({
    'A': [1, 2, 3],
    'B': [4, 5, 6]
})

print("Shape:", df.shape)
print("Rows:", df.shape[0])
print("Columns:", df.shape[1])

Output:

Shape: (3, 2) Rows: 3 Columns: 2

Column Information

columns

code.py
print("Column names:", df.columns.tolist())

dtypes - Data Types

code.py
print(df.dtypes)

Common types:

  • int64: Integers
  • float64: Decimals
  • object: Strings
  • bool: True/False
  • datetime64: Dates

info() - Overview

code.py
df.info()

Shows:

  • Number of rows
  • Column names
  • Data types
  • Non-null counts
  • Memory usage

Example output:

<class 'pandas.core.frame.DataFrame'> RangeIndex: 5 entries, 0 to 4 Data columns (total 3 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 Name 5 non-null object 1 Age 5 non-null int64 2 City 5 non-null object dtypes: int64(1), object(2) memory usage: 248.0+ bytes

describe() - Statistics

code.py
print(df.describe())

For numeric columns:

  • count: Number of values
  • mean: Average
  • std: Standard deviation
  • min: Minimum
  • 25%: First quartile
  • 50%: Median
  • 75%: Third quartile
  • max: Maximum

Include all columns:

code.py
print(df.describe(include='all'))

Index Information

code.py
print("Index:", df.index)
print("Index start:", df.index[0])
print("Index end:", df.index[-1])

Unique Values

code.py
print("Unique cities:", df['City'].unique())
print("Count unique:", df['City'].nunique())

Value Counts

code.py
print(df['City'].value_counts())

Shows: How many times each value appears.

Null Values

isnull()

code.py
print(df.isnull())

Returns True/False for each cell.

Count nulls

code.py
print("Null per column:")
print(df.isnull().sum())

print("Total nulls:", df.isnull().sum().sum())

Any nulls?

code.py
print("Has nulls:", df.isnull().values.any())

Memory Usage

code.py
print("Memory:", df.memory_usage())
print("Total:", df.memory_usage(deep=True).sum(), "bytes")

Sample Rows

Get random rows.

code.py
print(df.sample(3))

Useful for: Quick preview of large datasets.

Practice Example

The scenario: Inspect sales dataset.

code.py
import pandas as pd
import numpy as np

sales = pd.DataFrame({
    'Date': pd.date_range('2024-01-01', periods=100),
    'Product': np.random.choice(['Laptop', 'Phone', 'Tablet'], 100),
    'Quantity': np.random.randint(1, 10, 100),
    'Price': np.random.choice([999, 599, 399], 100)
})

print("=== BASIC INFO ===")
print("Shape:", sales.shape)
print("Columns:", sales.columns.tolist())
print()

print("=== FIRST ROWS ===")
print(sales.head(3))
print()

print("=== DATA TYPES ===")
print(sales.dtypes)
print()

print("=== DETAILED INFO ===")
sales.info()
print()

print("=== STATISTICS ===")
print(sales.describe())
print()

print("=== UNIQUE VALUES ===")
print("Products:", sales['Product'].unique())
print("Product counts:")
print(sales['Product'].value_counts())
print()

print("=== NULL CHECK ===")
print("Any nulls:", sales.isnull().values.any())
print("Nulls per column:")
print(sales.isnull().sum())
print()

print("=== RANDOM SAMPLE ===")
print(sales.sample(5))

Getting Specific Values

At position

code.py
value = df.iloc[0, 0]
print("First cell:", value)

By label

code.py
value = df.at[0, 'Name']
print("Value:", value)

Column Statistics

code.py
print("Max age:", df['Age'].max())
print("Min age:", df['Age'].min())
print("Mean age:", df['Age'].mean())
print("Sum ages:", df['Age'].sum())

Checking Duplicates

code.py
print("Duplicates:", df.duplicated().sum())
print("Duplicate rows:")
print(df[df.duplicated()])

Correlation

For numeric columns.

code.py
print(df.corr())

Shows: How columns relate to each other (-1 to 1).

Quick Functions

code.py
print("Min:", df.min())
print("Max:", df.max())
print("Sum:", df.sum())
print("Mean:", df.mean())
print("Median:", df.median())
print("Mode:", df.mode())

Key Points to Remember

head() shows first rows, tail() shows last rows. Use these to preview data.

info() gives complete overview: types, nulls, memory. Always run this first.

describe() shows statistics for numeric columns. Great for understanding data range.

shape gives (rows, columns). dtypes shows data type of each column.

isnull().sum() counts missing values per column. Critical for data quality check.

Common Mistakes

Mistake 1: Not checking data after loading

code.py
df = pd.read_csv('data.csv')
# Start analyzing without looking!

Always do:

code.py
print(df.head())
df.info()

Mistake 2: Assuming no nulls

code.py
df['Age'].mean()  # May give wrong result if nulls exist

Check first:

code.py
print(df.isnull().sum())

Mistake 3: Wrong shape access

code.py
rows = df.shape  # This is tuple (3, 2)
rows = df.shape[0]  # Correct way to get rows

Mistake 4: Ignoring data types

code.py
df['Price'].mean()  # Error if Price is string!
print(df.dtypes)  # Check first

What's Next?

You now know how to inspect DataFrames. Next, you'll learn about selecting columns - how to choose and work with specific columns from your DataFrame.