#1 Data Analytics Program in India
₹2,499₹1,499Enroll Now
12 min read

Importing/Exporting Data

Reading/writing CSV, Excel, and SQL data; basics of requests for APIs

What You'll Learn

  • Reading and writing CSV files
  • Working with Excel files
  • Connecting to SQL databases
  • Making API requests
  • Data import/export best practices

Working with CSV Files

Reading CSV with pandas:

code.py
import pandas as pd

# Read CSV file
df = pd.read_csv('data.csv')

# Read with specific options
df = pd.read_csv('data.csv',
                 sep=',',           # Delimiter
                 header=0,          # Row number for column names
                 index_col=0,       # Column to use as index
                 na_values=['NA', 'null'])  # Additional NA values

Writing CSV:

code.py
# Write to CSV
df.to_csv('output.csv', index=False)

# Write with options
df.to_csv('output.csv',
          index=False,
          sep=',',
          encoding='utf-8')

Working with Excel Files

Reading Excel:

code.py
# Read Excel file (first sheet)
df = pd.read_excel('data.xlsx')

# Read specific sheet
df = pd.read_excel('data.xlsx', sheet_name='Sheet1')

# Read multiple sheets
excel_file = pd.ExcelFile('data.xlsx')
df1 = excel_file.parse('Sheet1')
df2 = excel_file.parse('Sheet2')

# Read all sheets
all_sheets = pd.read_excel('data.xlsx', sheet_name=None)

Writing Excel:

code.py
# Write to Excel
df.to_excel('output.xlsx', index=False)

# Write multiple sheets
with pd.ExcelWriter('output.xlsx') as writer:
    df1.to_excel(writer, sheet_name='Sales', index=False)
    df2.to_excel(writer, sheet_name='Returns', index=False)

Working with SQL Databases

Connecting to database:

code.py
import sqlite3
import pandas as pd

# SQLite connection
conn = sqlite3.connect('database.db')

# Read SQL query
df = pd.read_sql_query("SELECT * FROM customers", conn)

# Read entire table
df = pd.read_sql_table('customers', conn)

# Close connection
conn.close()

Writing to database:

code.py
# Write DataFrame to SQL table
df.to_sql('customers',
          conn,
          if_exists='replace',  # 'fail', 'replace', 'append'
          index=False)

Using SQLAlchemy (recommended):

code.py
from sqlalchemy import create_engine

# Create engine
engine = create_engine('postgresql://user:password@localhost:5432/dbname')

# Read data
df = pd.read_sql("SELECT * FROM sales WHERE amount > 1000", engine)

# Write data
df.to_sql('sales_data', engine, if_exists='append', index=False)

Working with APIs

Basic API request:

code.py
import requests
import pandas as pd

# GET request
response = requests.get('https://api.example.com/data')

# Check if successful
if response.status_code == 200:
    data = response.json()
    df = pd.DataFrame(data)

API with parameters:

code.py
# Request with parameters
params = {
    'start_date': '2024-01-01',
    'end_date': '2024-12-31',
    'limit': 100
}

response = requests.get('https://api.example.com/sales', params=params)
data = response.json()

API with authentication:

code.py
# API key in header
headers = {
    'Authorization': 'Bearer YOUR_API_KEY',
    'Content-Type': 'application/json'
}

response = requests.get('https://api.example.com/data', headers=headers)

Handling pagination:

code.py
all_data = []
page = 1
while True:
    response = requests.get(f'https://api.example.com/data?page={page}')
    data = response.json()

    if not data:
        break

    all_data.extend(data)
    page += 1

df = pd.DataFrame(all_data)

Other File Formats

JSON:

code.py
# Read JSON
df = pd.read_json('data.json')

# Write JSON
df.to_json('output.json', orient='records', indent=2)

Parquet (efficient for large datasets):

code.py
# Read Parquet
df = pd.read_parquet('data.parquet')

# Write Parquet
df.to_parquet('output.parquet', compression='snappy')

Best Practices

Error handling:

code.py
try:
    df = pd.read_csv('data.csv')
except FileNotFoundError:
    print("File not found!")
except pd.errors.EmptyDataError:
    print("File is empty!")
except Exception as e:
    print(f"Error: {e}")

Data validation:

code.py
# Check data after loading
print(df.shape)  # Rows and columns
print(df.dtypes)  # Data types
print(df.head())  # First few rows
print(df.info())  # Summary info

Performance tips:

  • Use chunksize for large files
  • Specify dtypes when reading
  • Use compression for storage
  • Consider Parquet for big data

Reading large CSV in chunks:

code.py
chunk_size = 10000
chunks = []

for chunk in pd.read_csv('large_file.csv', chunksize=chunk_size):
    # Process each chunk
    processed = chunk[chunk['value'] > 100]
    chunks.append(processed)

df = pd.concat(chunks, ignore_index=True)

Practice Exercise

code.py
import pandas as pd
import requests

# 1. Read CSV
sales = pd.read_csv('sales.csv')

# 2. Filter data
recent_sales = sales[sales['date'] >= '2024-01-01']

# 3. Export to Excel
recent_sales.to_excel('recent_sales.xlsx', index=False)

# 4. Fetch from API
response = requests.get('https://api.example.com/products')
products = pd.DataFrame(response.json())

# 5. Merge and export
merged = recent_sales.merge(products, on='product_id')
merged.to_csv('sales_with_products.csv', index=False)

Next Steps

Let's learn data cleaning techniques!

Practice & Experiment

Test your understanding by running Python code directly in your browser. Try the examples from the article above!

SkillsetMaster - AI, Web Development & Data Analytics Courses