10 min read

Reproducible Analysis

Best practices for code organization, documentation, and version control

What You'll Learn

  • Why reproducibility matters
  • Project structure
  • Requirements files
  • Documentation (Docstrings & Markdown)
  • Version control basics (Git)

Why Reproducibility?

"It works on my machine" is not good enough.

  • Collaboration: Others need to run your code.
  • Future You: You will forget what you did in 6 months.
  • Trust: Science requires verification.

Project Structure

A standard structure helps everyone navigate.

my_project/ โ”œโ”€โ”€ data/ โ”‚ โ”œโ”€โ”€ raw/ # Immutable original data โ”‚ โ””โ”€โ”€ processed/ # Cleaned data โ”œโ”€โ”€ notebooks/ # Jupyter notebooks for exploration โ”œโ”€โ”€ src/ # Reusable Python scripts โ”‚ โ”œโ”€โ”€ __init__.py โ”‚ โ”œโ”€โ”€ data_cleaning.py โ”‚ โ””โ”€โ”€ modeling.py โ”œโ”€โ”€ requirements.txt # Dependencies โ”œโ”€โ”€ README.md # Project overview โ””โ”€โ”€ .gitignore # Files to ignore

Managing Dependencies

Always list your libraries.

Creating requirements.txt:

$ terminalBash
pip freeze > requirements.txt

Installing from requirements:

$ terminalBash
pip install -r requirements.txt

Documentation

Code Comments: Explain why, not what. Docstrings: Explain functions.

code.pyPython
def calculate_metrics(y_true, y_pred):
    """
    Calculates MSE and R2 score.

    Args:
        y_true (array): Actual values
        y_pred (array): Predicted values

    Returns:
        dict: Dictionary containing MSE and R2
    """
    pass

README.md:

  • Project Title
  • Description
  • Installation instructions
  • Usage examples
  • Credits

Version Control (Git)

  1. git init: Start tracking.
  2. git add .: Stage changes.
  3. git commit -m "message": Save snapshot.
  4. git push: Upload to GitHub/GitLab.

Important: Add data/ and .env to your .gitignore file! Never commit large data or passwords.

Next Steps

Let's make our insights pop with advanced visualizations!

Practice & Experiment

Test your understanding by running Python code directly in your browser.