Python for Data Analysis
Get started with Python, pandas, and data analysis workflows
120 minutes
5 min read
Learn to analyze data with Python, pandas, and Claude as your coding partner.
What You'll Build
A complete data analysis project that:
- Loads and explores a real dataset
- Cleans and transforms data
- Creates visualizations
- Generates insights
- Is version-controlled with Git
Prerequisites
- Python installed (Mac setup or Windows setup)
- Basic terminal/command line knowledge
- Git configured
Project Setup
1. Create Your Project
Bash
# Create project foldermkdir sales-analysiscd sales-analysis # Initialize Gitgit init # Create virtual environmentpython -m venv venv # Activate it# Mac/Linux:source venv/bin/activate# Windows:.\venv\Scripts\Activate.ps12. Install Packages
Bash
pip install pandas numpy matplotlib jupyter3. Create Project Structure
Bash
# Create foldersmkdir data notebooks scripts outputs # Create filestouch README.mdtouch scripts/analysis.pytouch .gitignore4. Set Up Git Ignore
Add to .gitignore:
Bash
venv/__pycache__/*.pyc.ipynb_checkpoints/*.csv!data/sample.csvoutputs/*.pngWorking with Claude
Ask Claude to Create CLAUDE.md
Open VS Code and ask Claude:
Bash
Create a CLAUDE.md file for this data analysis project. Project: Sales data analysisLanguage: PythonLibraries: pandas, matplotlibGoal: Analyze monthly sales trends Include:- Project context- Common commands- Example prompts for data analysisClaude will generate a customized CLAUDE.md!
Your First Analysis
Load Data
Create scripts/load_data.py:
Ask Claude:
Bash
Write a Python script to:1. Load data/sales.csv using pandas2. Display basic info (shape, columns, data types)3. Show first few rows4. Check for missing values Include error handling.Claude will generate something like:
Python
import pandas as pd def load_and_inspect_data(filepath): """Load and inspect CSV data""" try: df = pd.read_csv(filepath) print(f"Dataset shape: {df.shape}") print(f"\nColumns: {df.columns.tolist()}") print(f"\nData types:\n{df.dtypes}") print(f"\nFirst 5 rows:\n{df.head()}") print(f"\nMissing values:\n{df.isnull().sum()}") return df except FileNotFoundError: print(f"Error: {filepath} not found") return None if __name__ == "__main__": df = load_and_inspect_data("data/sales.csv")Clean Data
Ask Claude:
Bash
I have sales data with these issues:- Some missing values in 'revenue' column- Date column is string, needs to be datetime- Some negative quantities (data errors) Write a function to clean this data.Create Visualizations
Ask Claude:
Bash
Create visualizations for sales data:1. Monthly revenue trend (line plot)2. Revenue by product category (bar chart)3. Sales distribution (histogram) Save plots to outputs/ folder.Use clear labels and titles.Common Data Analysis Patterns
Explore Data
Python
# Summary statisticsdf.describe() # Value countsdf['category'].value_counts() # Group by analysisdf.groupby('product')['revenue'].sum() # Correlationdf.corr()Transform Data
Python
# Create new columnsdf['profit'] = df['revenue'] - df['cost'] # Filter rowshigh_value = df[df['revenue'] > 1000] # Sortdf.sort_values('date', ascending=False) # Aggregatemonthly_sales = df.groupby('month').agg({ 'revenue': 'sum', 'quantity': 'sum'})Using Jupyter Notebooks
Start Jupyter
Bash
jupyter notebookAsk Claude for Notebook Structure
Bash
Create a Jupyter notebook structure for analyzing:- Customer purchase patterns- Seasonal trends- Product performance Include markdown sections and code cell placeholders.Git Workflow for Analysis
Commit Your Progress
Bash
# After loading datagit add scripts/load_data.pygit commit -m "feat: add data loading script" # After cleaninggit add scripts/clean_data.pygit commit -m "feat: add data cleaning pipeline" # After visualizationgit add scripts/visualize.py outputs/git commit -m "feat: add sales visualizations"Using Claude for Commits
Bash
I made these changes to my analysis:[describe changes] Suggest a good commit message.Example: Complete Analysis
Full Workflow with Claude
- Plan (ask Claude):
Bash
I have sales data with: date, product, category, quantity, price. Help me plan an analysis to find:- Best-selling products- Revenue trends- Seasonal patterns What steps should I take?- Implement (with Claude's guidance):
- Write loading script
- Clean data
- Perform calculations
- Create visualizations
- Document findings
- Review (ask Claude):
Bash
Review my analysis code:[paste code] Check for:- Correctness- Efficiency- Best practices- Missing edge cases- Document (ask Claude):
Bash
Write a README for this analysis project. Include:- What the project does- How to run it- Key findings- Required dependenciesBest Practices
Code Organization
Bash
sales-analysis/├── data/ # Raw data (gitignored)├── notebooks/ # Jupyter notebooks├── scripts/ # Python scripts│ ├── load.py│ ├── clean.py│ └── analyze.py├── outputs/ # Generated plots/reports├── README.md├── CLAUDE.md├── .gitignore└── requirements.txtRequirements File
Create requirements.txt:
Bash
pandas==2.1.0numpy==1.25.0matplotlib==3.7.0jupyter==1.0.0Install from it:
Bash
pip install -r requirements.txtDocument Your Analysis
Ask Claude:
Bash
Document this analysis function:[paste function] Write a docstring explaining:- Purpose- Parameters- Returns- Example usageTroubleshooting
Virtual Environment Issues
Bash
# Can't activate venv# Mac: Add to ~/.zshrcexport PATH="$HOME/.local/bin:$PATH" # Windows: Change execution policySet-ExecutionPolicy -Scope CurrentUser -ExecutionPolicy RemoteSignedImport Errors
Bash
# Wrong Python/pipwhich pythonwhich pip # Reinstall in venvdeactivaterm -rf venvpython -m venv venvsource venv/bin/activatepip install -r requirements.txtPandas Issues
Ask Claude:
Bash
I'm getting this pandas error:[paste error] From this code:[paste code] How do I fix it?Next Steps
Expand Your Skills
-
More Complex Analysis
- Time series forecasting
- Statistical testing
- Machine learning basics
-
Better Visualizations
- Seaborn for statistical plots
- Plotly for interactive charts
- Dashboards with Streamlit
-
Automation
- Schedule reports
- Process multiple files
- See Automation track
Sample Projects
Try analyzing:
- COVID-19 data
- Stock prices
- Sports statistics
- Weather patterns
- Your own data!
Resources
- pandas documentation
- Python Data Science Handbook
- Kaggle Learn - Free courses
- Real Python - Tutorials
Ready to analyze! Start with a simple dataset and let Claude guide you through the process.