10 min read
Reproducible Analysis
Best practices for code organization, documentation, and version control
What You'll Learn
- Why reproducibility matters
- Project structure
- Requirements files
- Documentation (Docstrings & Markdown)
- Version control basics (Git)
Why Reproducibility?
"It works on my machine" is not good enough.
- Collaboration: Others need to run your code.
- Future You: You will forget what you did in 6 months.
- Trust: Science requires verification.
Project Structure
A standard structure helps everyone navigate.
my_project/
โโโ data/
โ โโโ raw/ # Immutable original data
โ โโโ processed/ # Cleaned data
โโโ notebooks/ # Jupyter notebooks for exploration
โโโ src/ # Reusable Python scripts
โ โโโ __init__.py
โ โโโ data_cleaning.py
โ โโโ modeling.py
โโโ requirements.txt # Dependencies
โโโ README.md # Project overview
โโโ .gitignore # Files to ignore
Managing Dependencies
Always list your libraries.
Creating requirements.txt:
$ terminalBash
pip freeze > requirements.txtInstalling from requirements:
$ terminalBash
pip install -r requirements.txtDocumentation
Code Comments: Explain why, not what. Docstrings: Explain functions.
code.pyPython
def calculate_metrics(y_true, y_pred):
"""
Calculates MSE and R2 score.
Args:
y_true (array): Actual values
y_pred (array): Predicted values
Returns:
dict: Dictionary containing MSE and R2
"""
passREADME.md:
- Project Title
- Description
- Installation instructions
- Usage examples
- Credits
Version Control (Git)
- git init: Start tracking.
- git add .: Stage changes.
- git commit -m "message": Save snapshot.
- git push: Upload to GitHub/GitLab.
Important: Add data/ and .env to your .gitignore file! Never commit large data or passwords.
Next Steps
Let's make our insights pop with advanced visualizations!
Practice & Experiment
Test your understanding by running Python code directly in your browser.