12 min read
Importing/Exporting Data
Reading/writing CSV, Excel, and SQL data; basics of requests for APIs
What You'll Learn
- Reading and writing CSV files
- Working with Excel files
- Connecting to SQL databases
- Making API requests
- Data import/export best practices
Working with CSV Files
Reading CSV with pandas:
code.py
import pandas as pd
# Read CSV file
df = pd.read_csv('data.csv')
# Read with specific options
df = pd.read_csv('data.csv',
sep=',', # Delimiter
header=0, # Row number for column names
index_col=0, # Column to use as index
na_values=['NA', 'null']) # Additional NA valuesWriting CSV:
code.py
# Write to CSV
df.to_csv('output.csv', index=False)
# Write with options
df.to_csv('output.csv',
index=False,
sep=',',
encoding='utf-8')Working with Excel Files
Reading Excel:
code.py
# Read Excel file (first sheet)
df = pd.read_excel('data.xlsx')
# Read specific sheet
df = pd.read_excel('data.xlsx', sheet_name='Sheet1')
# Read multiple sheets
excel_file = pd.ExcelFile('data.xlsx')
df1 = excel_file.parse('Sheet1')
df2 = excel_file.parse('Sheet2')
# Read all sheets
all_sheets = pd.read_excel('data.xlsx', sheet_name=None)Writing Excel:
code.py
# Write to Excel
df.to_excel('output.xlsx', index=False)
# Write multiple sheets
with pd.ExcelWriter('output.xlsx') as writer:
df1.to_excel(writer, sheet_name='Sales', index=False)
df2.to_excel(writer, sheet_name='Returns', index=False)Working with SQL Databases
Connecting to database:
code.py
import sqlite3
import pandas as pd
# SQLite connection
conn = sqlite3.connect('database.db')
# Read SQL query
df = pd.read_sql_query("SELECT * FROM customers", conn)
# Read entire table
df = pd.read_sql_table('customers', conn)
# Close connection
conn.close()Writing to database:
code.py
# Write DataFrame to SQL table
df.to_sql('customers',
conn,
if_exists='replace', # 'fail', 'replace', 'append'
index=False)Using SQLAlchemy (recommended):
code.py
from sqlalchemy import create_engine
# Create engine
engine = create_engine('postgresql://user:password@localhost:5432/dbname')
# Read data
df = pd.read_sql("SELECT * FROM sales WHERE amount > 1000", engine)
# Write data
df.to_sql('sales_data', engine, if_exists='append', index=False)Working with APIs
Basic API request:
code.py
import requests
import pandas as pd
# GET request
response = requests.get('https://api.example.com/data')
# Check if successful
if response.status_code == 200:
data = response.json()
df = pd.DataFrame(data)API with parameters:
code.py
# Request with parameters
params = {
'start_date': '2024-01-01',
'end_date': '2024-12-31',
'limit': 100
}
response = requests.get('https://api.example.com/sales', params=params)
data = response.json()API with authentication:
code.py
# API key in header
headers = {
'Authorization': 'Bearer YOUR_API_KEY',
'Content-Type': 'application/json'
}
response = requests.get('https://api.example.com/data', headers=headers)Handling pagination:
code.py
all_data = []
page = 1
while True:
response = requests.get(f'https://api.example.com/data?page={page}')
data = response.json()
if not data:
break
all_data.extend(data)
page += 1
df = pd.DataFrame(all_data)Other File Formats
JSON:
code.py
# Read JSON
df = pd.read_json('data.json')
# Write JSON
df.to_json('output.json', orient='records', indent=2)Parquet (efficient for large datasets):
code.py
# Read Parquet
df = pd.read_parquet('data.parquet')
# Write Parquet
df.to_parquet('output.parquet', compression='snappy')Best Practices
Error handling:
code.py
try:
df = pd.read_csv('data.csv')
except FileNotFoundError:
print("File not found!")
except pd.errors.EmptyDataError:
print("File is empty!")
except Exception as e:
print(f"Error: {e}")Data validation:
code.py
# Check data after loading
print(df.shape) # Rows and columns
print(df.dtypes) # Data types
print(df.head()) # First few rows
print(df.info()) # Summary infoPerformance tips:
- Use
chunksizefor large files - Specify dtypes when reading
- Use compression for storage
- Consider Parquet for big data
Reading large CSV in chunks:
code.py
chunk_size = 10000
chunks = []
for chunk in pd.read_csv('large_file.csv', chunksize=chunk_size):
# Process each chunk
processed = chunk[chunk['value'] > 100]
chunks.append(processed)
df = pd.concat(chunks, ignore_index=True)Practice Exercise
code.py
import pandas as pd
import requests
# 1. Read CSV
sales = pd.read_csv('sales.csv')
# 2. Filter data
recent_sales = sales[sales['date'] >= '2024-01-01']
# 3. Export to Excel
recent_sales.to_excel('recent_sales.xlsx', index=False)
# 4. Fetch from API
response = requests.get('https://api.example.com/products')
products = pd.DataFrame(response.json())
# 5. Merge and export
merged = recent_sales.merge(products, on='product_id')
merged.to_csv('sales_with_products.csv', index=False)Next Steps
Let's learn data cleaning techniques!
Practice & Experiment
Test your understanding by running Python code directly in your browser. Try the examples from the article above!