#1 Data Analytics Program in India
₹2,499₹1,499Enroll Now
6 min read min read

Reading Excel Files

Learn to read data from Excel files using Python

Reading Excel Files

What are Excel Files?

Excel files (with .xlsx or .xls extension) are spreadsheet files created by Microsoft Excel. They can have multiple sheets, colors, formulas, and formatting.

Difference from CSV:

  • CSV: Plain text, one sheet, no formatting
  • Excel: Rich format, multiple sheets, colors, formulas

Why read Excel files in Python:

  • Most companies use Excel for data
  • Handle files with multiple sheets
  • Access data from business reports
  • Automate data processing

The openpyxl Library

Python needs a special library to read Excel files. openpyxl is the most popular.

First, install it:

pip install openpyxl

Then import it:

code.py
import openpyxl

Opening an Excel File

code.py
import openpyxl

workbook = openpyxl.load_workbook("data.xlsx")
print("File opened successfully")

What this does: Loads the Excel file. The file is called a workbook.

Important: File must exist or you get an error.

Getting Sheet Names

Excel files can have multiple sheets. First, see what sheets exist.

code.py
import openpyxl

workbook = openpyxl.load_workbook("data.xlsx")
sheet_names = workbook.sheetnames
print("Sheets:", sheet_names)

Shows something like: ['Sheet1', 'Sales', 'Expenses']

Accessing a Sheet

Get Active Sheet

code.py
import openpyxl

workbook = openpyxl.load_workbook("data.xlsx")
sheet = workbook.active
print("Active sheet:", sheet.title)

What active means: The sheet that was open when file was last saved.

Get Specific Sheet by Name

code.py
import openpyxl

workbook = openpyxl.load_workbook("data.xlsx")
sheet = workbook["Sales"]
print("Working with:", sheet.title)

What this does: Opens the sheet named "Sales".

Reading Cell Values

Reading Single Cell

Cells are identified by column letter and row number, like "A1" or "B5".

code.py
import openpyxl

workbook = openpyxl.load_workbook("data.xlsx")
sheet = workbook.active

cell_value = sheet["A1"].value
print("Cell A1:", cell_value)

What this shows: The value in cell A1.

Reading by Row and Column Numbers

code.py
import openpyxl

workbook = openpyxl.load_workbook("data.xlsx")
sheet = workbook.active

cell_value = sheet.cell(row=1, column=1).value
print("Cell (1,1):", cell_value)

What this does: Gets cell at row 1, column 1 (same as A1). Rows and columns start at 1, not 0.

Reading Multiple Cells

Reading a Row

code.py
import openpyxl

workbook = openpyxl.load_workbook("data.xlsx")
sheet = workbook.active

for cell in sheet[1]:
    print(cell.value)

What this does: Reads all cells in row 1.

Reading a Column

code.py
import openpyxl

workbook = openpyxl.load_workbook("data.xlsx")
sheet = workbook.active

for cell in sheet["A"]:
    print(cell.value)

What this does: Reads all cells in column A.

Reading All Data

Loop Through All Rows

code.py
import openpyxl

workbook = openpyxl.load_workbook("data.xlsx")
sheet = workbook.active

for row in sheet.iter_rows(values_only=True):
    print(row)

What this shows: Each row as a tuple of values.

Example output:

('Name', 'Age', 'City') ('John', 25, 'New York') ('Sarah', 30, 'London')

What values_only=True means: Just get the values, not cell objects.

Loop Through Specific Range

code.py
import openpyxl

workbook = openpyxl.load_workbook("data.xlsx")
sheet = workbook.active

for row in sheet.iter_rows(min_row=2, max_row=5, min_col=1, max_col=3, values_only=True):
    print(row)

What this does: Reads rows 2 to 5, columns 1 to 3 only.

Why this is useful: Skip header rows or read specific sections.

Getting Sheet Dimensions

Find out how much data is in the sheet.

code.py
import openpyxl

workbook = openpyxl.load_workbook("data.xlsx")
sheet = workbook.active

max_row = sheet.max_row
max_col = sheet.max_column

print("Total rows:", max_row)
print("Total columns:", max_col)

What this tells you: The last row and column with data.

Converting to List of Dictionaries

Make data easier to work with by converting to dictionaries.

code.py
import openpyxl

workbook = openpyxl.load_workbook("data.xlsx")
sheet = workbook.active

rows = list(sheet.iter_rows(values_only=True))
headers = rows[0]
data = []

for row in rows[1:]:
    row_dict = {}
    for i, value in enumerate(row):
        row_dict[headers[i]] = value
    data.append(row_dict)

for item in data:
    print(item)

What this creates: List of dictionaries where keys are column names.

Example: {'Name': 'John', 'Age': 25, 'City': 'New York'}

Reading Specific Cells

code.py
import openpyxl

workbook = openpyxl.load_workbook("sales.xlsx")
sheet = workbook.active

name = sheet["A2"].value
quantity = sheet["B2"].value
price = sheet["C2"].value

total = quantity * price
print("Customer:", name)
print("Total:", total)

What this does: Reads specific cells and uses them in calculations.

Handling None Values

Empty cells return None. Handle them to avoid errors.

code.py
import openpyxl

workbook = openpyxl.load_workbook("data.xlsx")
sheet = workbook.active

for row in sheet.iter_rows(min_row=2, values_only=True):
    name = row[0] if row[0] else "Unknown"
    age = row[1] if row[1] else 0
    print(name, age)

What this does: Uses default values when cells are empty.

Practice Example

The scenario: You have an Excel file with product sales data. You want to read it and calculate totals.

Excel file structure (sales.xlsx):

| Product | Quantity | Price | |----------|----------|-------| | Laptop | 5 | 999 | | Phone | 10 | 599 | | Tablet | 7 | 399 |

Python program:

code.py
import openpyxl

workbook = openpyxl.load_workbook("sales.xlsx")
sheet = workbook.active

print("Product Sales Report")
print("=" * 40)

total_revenue = 0
product_count = 0

for row in sheet.iter_rows(min_row=2, values_only=True):
    product = row[0]
    quantity = row[1]
    price = row[2]

    if product and quantity and price:
        revenue = quantity * price
        total_revenue = total_revenue + revenue
        product_count = product_count + 1

        print("Product:", product)
        print("  Sold:", quantity, "units")
        print("  Price:", price)
        print("  Revenue:", revenue)
        print()

print("=" * 40)
print("Total products:", product_count)
print("Total revenue:", total_revenue)

workbook.close()

What this program does:

  1. Opens sales.xlsx file
  2. Skips header row (starts from row 2)
  3. Reads product, quantity, price from each row
  4. Calculates revenue for each product
  5. Keeps running total
  6. Shows detailed report
  7. Closes workbook when done

Reading Multiple Sheets

code.py
import openpyxl

workbook = openpyxl.load_workbook("report.xlsx")

for sheet_name in workbook.sheetnames:
    sheet = workbook[sheet_name]
    print("Sheet:", sheet_name)
    print("Rows:", sheet.max_row)
    print()

What this does: Loops through all sheets and shows information about each.

Checking Cell Data Types

code.py
import openpyxl

workbook = openpyxl.load_workbook("data.xlsx")
sheet = workbook.active

cell = sheet["A1"]
print("Value:", cell.value)
print("Type:", type(cell.value))

What this shows: The value and its Python type (string, number, date, etc.).

Key Points to Remember

Use openpyxl library to read Excel files. Install it first with pip install openpyxl.

Excel files are called workbooks. Each workbook can have multiple sheets. Access sheets by name or use active sheet.

Cells are accessed using column letters and row numbers like "A1" or with cell(row, column) method.

Use iter_rows() to loop through rows efficiently. values_only=True gives just the values, not cell objects.

Empty cells return None. Always check for None to avoid errors in calculations.

Common Mistakes

Mistake 1: Forgetting to install openpyxl

code.py
import openpyxl  # ModuleNotFoundError!

Run: pip install openpyxl first

Mistake 2: Wrong sheet name

code.py
sheet = workbook["Sales"]  # KeyError if sheet doesn't exist

Check workbook.sheetnames first

Mistake 3: Starting from 0

code.py
cell = sheet.cell(row=0, column=0)  # Error!

Rows and columns start at 1, not 0

Mistake 4: Not handling None

code.py
total = sheet["A1"].value + sheet["A2"].value  # Error if any cell is empty!

Check for None first

Mistake 5: Not closing workbook

code.py
workbook = openpyxl.load_workbook("data.xlsx")
# ... work with file ...
# Forgot workbook.close()

What's Next?

You now know how to read Excel files. Next, you'll learn about writing to Excel files - creating new Excel files, adding data, and formatting.

SkillsetMaster - AI, Web Development & Data Analytics Courses