Reading Excel Files
Learn to read data from Excel files using Python
Reading Excel Files
What are Excel Files?
Excel files (with .xlsx or .xls extension) are spreadsheet files created by Microsoft Excel. They can have multiple sheets, colors, formulas, and formatting.
Difference from CSV:
- CSV: Plain text, one sheet, no formatting
- Excel: Rich format, multiple sheets, colors, formulas
Why read Excel files in Python:
- Most companies use Excel for data
- Handle files with multiple sheets
- Access data from business reports
- Automate data processing
The openpyxl Library
Python needs a special library to read Excel files. openpyxl is the most popular.
First, install it:
pip install openpyxl
Then import it:
import openpyxlOpening an Excel File
import openpyxl
workbook = openpyxl.load_workbook("data.xlsx")
print("File opened successfully")What this does: Loads the Excel file. The file is called a workbook.
Important: File must exist or you get an error.
Getting Sheet Names
Excel files can have multiple sheets. First, see what sheets exist.
import openpyxl
workbook = openpyxl.load_workbook("data.xlsx")
sheet_names = workbook.sheetnames
print("Sheets:", sheet_names)Shows something like: ['Sheet1', 'Sales', 'Expenses']
Accessing a Sheet
Get Active Sheet
import openpyxl
workbook = openpyxl.load_workbook("data.xlsx")
sheet = workbook.active
print("Active sheet:", sheet.title)What active means: The sheet that was open when file was last saved.
Get Specific Sheet by Name
import openpyxl
workbook = openpyxl.load_workbook("data.xlsx")
sheet = workbook["Sales"]
print("Working with:", sheet.title)What this does: Opens the sheet named "Sales".
Reading Cell Values
Reading Single Cell
Cells are identified by column letter and row number, like "A1" or "B5".
import openpyxl
workbook = openpyxl.load_workbook("data.xlsx")
sheet = workbook.active
cell_value = sheet["A1"].value
print("Cell A1:", cell_value)What this shows: The value in cell A1.
Reading by Row and Column Numbers
import openpyxl
workbook = openpyxl.load_workbook("data.xlsx")
sheet = workbook.active
cell_value = sheet.cell(row=1, column=1).value
print("Cell (1,1):", cell_value)What this does: Gets cell at row 1, column 1 (same as A1). Rows and columns start at 1, not 0.
Reading Multiple Cells
Reading a Row
import openpyxl
workbook = openpyxl.load_workbook("data.xlsx")
sheet = workbook.active
for cell in sheet[1]:
print(cell.value)What this does: Reads all cells in row 1.
Reading a Column
import openpyxl
workbook = openpyxl.load_workbook("data.xlsx")
sheet = workbook.active
for cell in sheet["A"]:
print(cell.value)What this does: Reads all cells in column A.
Reading All Data
Loop Through All Rows
import openpyxl
workbook = openpyxl.load_workbook("data.xlsx")
sheet = workbook.active
for row in sheet.iter_rows(values_only=True):
print(row)What this shows: Each row as a tuple of values.
Example output:
('Name', 'Age', 'City')
('John', 25, 'New York')
('Sarah', 30, 'London')
What values_only=True means: Just get the values, not cell objects.
Loop Through Specific Range
import openpyxl
workbook = openpyxl.load_workbook("data.xlsx")
sheet = workbook.active
for row in sheet.iter_rows(min_row=2, max_row=5, min_col=1, max_col=3, values_only=True):
print(row)What this does: Reads rows 2 to 5, columns 1 to 3 only.
Why this is useful: Skip header rows or read specific sections.
Getting Sheet Dimensions
Find out how much data is in the sheet.
import openpyxl
workbook = openpyxl.load_workbook("data.xlsx")
sheet = workbook.active
max_row = sheet.max_row
max_col = sheet.max_column
print("Total rows:", max_row)
print("Total columns:", max_col)What this tells you: The last row and column with data.
Converting to List of Dictionaries
Make data easier to work with by converting to dictionaries.
import openpyxl
workbook = openpyxl.load_workbook("data.xlsx")
sheet = workbook.active
rows = list(sheet.iter_rows(values_only=True))
headers = rows[0]
data = []
for row in rows[1:]:
row_dict = {}
for i, value in enumerate(row):
row_dict[headers[i]] = value
data.append(row_dict)
for item in data:
print(item)What this creates: List of dictionaries where keys are column names.
Example: {'Name': 'John', 'Age': 25, 'City': 'New York'}
Reading Specific Cells
import openpyxl
workbook = openpyxl.load_workbook("sales.xlsx")
sheet = workbook.active
name = sheet["A2"].value
quantity = sheet["B2"].value
price = sheet["C2"].value
total = quantity * price
print("Customer:", name)
print("Total:", total)What this does: Reads specific cells and uses them in calculations.
Handling None Values
Empty cells return None. Handle them to avoid errors.
import openpyxl
workbook = openpyxl.load_workbook("data.xlsx")
sheet = workbook.active
for row in sheet.iter_rows(min_row=2, values_only=True):
name = row[0] if row[0] else "Unknown"
age = row[1] if row[1] else 0
print(name, age)What this does: Uses default values when cells are empty.
Practice Example
The scenario: You have an Excel file with product sales data. You want to read it and calculate totals.
Excel file structure (sales.xlsx):
| Product | Quantity | Price |
|----------|----------|-------|
| Laptop | 5 | 999 |
| Phone | 10 | 599 |
| Tablet | 7 | 399 |
Python program:
import openpyxl
workbook = openpyxl.load_workbook("sales.xlsx")
sheet = workbook.active
print("Product Sales Report")
print("=" * 40)
total_revenue = 0
product_count = 0
for row in sheet.iter_rows(min_row=2, values_only=True):
product = row[0]
quantity = row[1]
price = row[2]
if product and quantity and price:
revenue = quantity * price
total_revenue = total_revenue + revenue
product_count = product_count + 1
print("Product:", product)
print(" Sold:", quantity, "units")
print(" Price:", price)
print(" Revenue:", revenue)
print()
print("=" * 40)
print("Total products:", product_count)
print("Total revenue:", total_revenue)
workbook.close()What this program does:
- Opens sales.xlsx file
- Skips header row (starts from row 2)
- Reads product, quantity, price from each row
- Calculates revenue for each product
- Keeps running total
- Shows detailed report
- Closes workbook when done
Reading Multiple Sheets
import openpyxl
workbook = openpyxl.load_workbook("report.xlsx")
for sheet_name in workbook.sheetnames:
sheet = workbook[sheet_name]
print("Sheet:", sheet_name)
print("Rows:", sheet.max_row)
print()What this does: Loops through all sheets and shows information about each.
Checking Cell Data Types
import openpyxl
workbook = openpyxl.load_workbook("data.xlsx")
sheet = workbook.active
cell = sheet["A1"]
print("Value:", cell.value)
print("Type:", type(cell.value))What this shows: The value and its Python type (string, number, date, etc.).
Key Points to Remember
Use openpyxl library to read Excel files. Install it first with pip install openpyxl.
Excel files are called workbooks. Each workbook can have multiple sheets. Access sheets by name or use active sheet.
Cells are accessed using column letters and row numbers like "A1" or with cell(row, column) method.
Use iter_rows() to loop through rows efficiently. values_only=True gives just the values, not cell objects.
Empty cells return None. Always check for None to avoid errors in calculations.
Common Mistakes
Mistake 1: Forgetting to install openpyxl
import openpyxl # ModuleNotFoundError!Run: pip install openpyxl first
Mistake 2: Wrong sheet name
sheet = workbook["Sales"] # KeyError if sheet doesn't existCheck workbook.sheetnames first
Mistake 3: Starting from 0
cell = sheet.cell(row=0, column=0) # Error!Rows and columns start at 1, not 0
Mistake 4: Not handling None
total = sheet["A1"].value + sheet["A2"].value # Error if any cell is empty!Check for None first
Mistake 5: Not closing workbook
workbook = openpyxl.load_workbook("data.xlsx")
# ... work with file ...
# Forgot workbook.close()What's Next?
You now know how to read Excel files. Next, you'll learn about writing to Excel files - creating new Excel files, adding data, and formatting.