API Authentication & Pagination

Why Authentication?

APIs need to know who is making requests. Authentication proves your identity.

Reasons for authentication:

Limit how many requests you can make
Track usage
Protect private data
Charge for usage (paid APIs)

Common Authentication Methods

1. API Key

Simplest method. You get a key when you sign up.

code.py

import requests

API_KEY = "your_api_key_here"

headers = {
    "X-API-Key": API_KEY
}

response = requests.get(
    "https://api.example.com/data",
    headers=headers
)

Or in URL:

code.py

params = {
    "api_key": API_KEY
}

response = requests.get(
    "https://api.example.com/data",
    params=params
)

2. Bearer Token

Token-based authentication (OAuth, JWT).

code.py

import requests

ACCESS_TOKEN = "your_access_token_here"

headers = {
    "Authorization": "Bearer " + ACCESS_TOKEN
}

response = requests.get(
    "https://api.example.com/data",
    headers=headers
)

What "Bearer" means: Standard way to send access tokens.

3. Basic Authentication

Username and password (less common now).

code.py

import requests

response = requests.get(
    "https://api.example.com/data",
    auth=("username", "password")
)

What this does: Automatically encodes username:password for authentication.

Environment Variables for API Keys

Never hardcode API keys in your code!

Create .env file:

API_KEY=your_actual_key_here
API_SECRET=your_secret_here

Use in Python:

code.py

import os
import requests

API_KEY = os.environ.get("API_KEY")

if not API_KEY:
    print("API key not found!")
else:
    headers = {"X-API-Key": API_KEY}
    response = requests.get(
        "https://api.example.com/data",
        headers=headers
    )

Or use python-dotenv:

code.py

from dotenv import load_dotenv
import os
import requests

load_dotenv()

API_KEY = os.getenv("API_KEY")

Install dotenv:

pip install python-dotenv

What is Pagination?

When data has thousands of items, APIs don't send everything at once. They break it into pages.

Why pagination:

Faster responses
Less data transfer
Prevents server overload

Common pagination styles:

Page number (page 1, 2, 3...)
Cursor-based (next token)
Offset-based (skip first N items)

Page Number Pagination

Most common style. Request specific page numbers.

code.py

import requests

page = 1
per_page = 10

params = {
    "page": page,
    "per_page": per_page
}

response = requests.get(
    "https://api.example.com/posts",
    params=params
)

data = response.json()
print("Page", page, "items:", len(data))

What per_page means: How many items per page.

Getting All Pages

Loop through all pages until no more data.

code.py

import requests

all_items = []
page = 1

while True:
    params = {"page": page, "per_page": 10}

    response = requests.get(
        "https://api.example.com/posts",
        params=params
    )

    if response.status_code != 200:
        break

    items = response.json()

    if not items:
        break

    all_items.extend(items)
    print("Got page", page, ":", len(items), "items")

    page = page + 1

print("Total items:", len(all_items))

What this does:

Starts at page 1
Gets items from each page
Adds to all_items list
Stops when no more items

Pagination with Link Headers

Some APIs provide next page URL in headers.

code.py

import requests

url = "https://api.github.com/users/octocat/repos"
all_repos = []

while url:
    response = requests.get(url)

    if response.status_code != 200:
        break

    repos = response.json()
    all_repos.extend(repos)

    if "next" in response.links:
        url = response.links["next"]["url"]
    else:
        url = None

print("Total repos:", len(all_repos))

What response.links does: Extracts pagination links from response headers.

Cursor-Based Pagination

Uses cursor token to get next set of results.

code.py

import requests

all_items = []
cursor = None

while True:
    params = {"limit": 20}

    if cursor:
        params["cursor"] = cursor

    response = requests.get(
        "https://api.example.com/data",
        params=params
    )

    if response.status_code != 200:
        break

    data = response.json()
    items = data["items"]
    all_items.extend(items)

    cursor = data.get("next_cursor")

    if not cursor:
        break

print("Total items:", len(all_items))

What cursor does: Points to position in dataset. More reliable than page numbers.

Rate Limiting

APIs limit how many requests you can make.

Common limits:

100 requests per hour
1000 requests per day
10 requests per second

Check headers for rate limit info:

code.py

import requests

response = requests.get(
    "https://api.example.com/data",
    headers={"Authorization": "Bearer token"}
)

print("Limit:", response.headers.get("X-RateLimit-Limit"))
print("Remaining:", response.headers.get("X-RateLimit-Remaining"))
print("Reset:", response.headers.get("X-RateLimit-Reset"))

What these mean:

Limit: Maximum requests allowed
Remaining: How many left
Reset: When limit resets (usually timestamp)

Handling Rate Limits

code.py

import requests
import time

def make_request_with_rate_limit(url, headers):
    response = requests.get(url, headers=headers)

    if response.status_code == 429:
        print("Rate limit hit. Waiting...")
        retry_after = int(response.headers.get("Retry-After", 60))
        time.sleep(retry_after)
        return make_request_with_rate_limit(url, headers)

    return response

response = make_request_with_rate_limit(
    "https://api.example.com/data",
    {"Authorization": "Bearer token"}
)

What status 429 means: Too many requests. Need to wait.

Practice Example

The scenario: Get all repositories from GitHub user with authentication and pagination.

code.py

import requests
import os
import time

GITHUB_TOKEN = os.getenv("GITHUB_TOKEN")

if not GITHUB_TOKEN:
    print("Please set GITHUB_TOKEN environment variable")
    exit()

headers = {
    "Authorization": "Bearer " + GITHUB_TOKEN,
    "Accept": "application/vnd.github+json"
}

username = "octocat"
url = "https://api.github.com/users/" + username + "/repos"

all_repos = []
page = 1

while True:
    params = {
        "per_page": 30,
        "page": page
    }

    print("Fetching page", page + "...")

    response = requests.get(url, headers=headers, params=params)

    if response.status_code != 200:
        print("Error:", response.status_code)
        break

    remaining = response.headers.get("X-RateLimit-Remaining")
    print("Rate limit remaining:", remaining)

    repos = response.json()

    if not repos:
        break

    all_repos.extend(repos)

    page = page + 1

    time.sleep(1)

print()
print("Total repositories:", len(all_repos))
print()
print("Repository names:")
for repo in all_repos:
    print("-", repo["name"], "(" + str(repo["stargazers_count"]) + " stars)")

What this program does:

Gets GitHub token from environment
Sets up authentication headers
Loops through all pages
Checks rate limit remaining
Adds small delay between requests
Shows all repos with star counts

OAuth 2.0 Flow

For APIs that use OAuth (like Google, Facebook).

Basic flow:

Direct user to authorization URL
User logs in and approves
Get authorization code
Exchange code for access token
Use access token in requests

Getting access token:

code.py

import requests

token_url = "https://oauth.example.com/token"

data = {
    "grant_type": "authorization_code",
    "code": "authorization_code_here",
    "client_id": "your_client_id",
    "client_secret": "your_client_secret"
}

response = requests.post(token_url, data=data)

if response.status_code == 200:
    tokens = response.json()
    access_token = tokens["access_token"]
    refresh_token = tokens["refresh_token"]
    print("Access token obtained")

Using access token:

code.py

headers = {"Authorization": "Bearer " + access_token}
response = requests.get("https://api.example.com/data", headers=headers)

Key Points to Remember

Use environment variables for API keys and secrets. Never hardcode them in your code.

Add authentication to requests using headers (most common) or auth parameter.

Pagination breaks large datasets into pages. Loop through pages to get all data.

Check response headers for pagination info (next page URL, cursor, etc).

Respect rate limits. Check headers for limit info and handle 429 status code.

Common Mistakes

Mistake 1: Hardcoding API keys

code.py

API_KEY = "abc123"  # DON'T DO THIS!

Use environment variables.

Mistake 2: Not handling rate limits

code.py

while True:
    response = requests.get(url)  # Will hit rate limit!

Add delays or check rate limit headers.

Mistake 3: Assuming all data in one response

code.py

response = requests.get(url)
data = response.json()  # Might be only first page!

Check for pagination.

Mistake 4: Infinite pagination loop

code.py

while True:
    response = requests.get(url, params={"page": page})
    # Forgot to check if empty or add break condition!

What's Next?

Congratulations! You've completed Module 3: Data Import/Export. You now know how to work with CSV, Excel, SQL databases, JSON, XML, and APIs. These skills let you get data from anywhere and use it in your Python programs.