Research Case Studies

See Claude Code in action with complete academic workflows. These aren't toy examples—they're the kinds of tasks that eat up your research time.

Case Study 1: From Raw Data to Publication Figure

Scenario: You have messy field data and need a publication-ready figure for a journal submission deadline.

The Problem

Bash

Raw data: 847 observations across 12 sites
Issues: Mixed date formats, missing values coded differently, outliers
Deadline: Journal submission due in 3 days

The Workflow

Step 1: Assess the mess

Bash

1claude "Read data/field_observations.csv and give me a data quality report:
2- Missing values by column
3- Date format inconsistencies
4- Potential outliers
5- Any obvious data entry errors"

Claude examines your data and returns:

Bash

1Data Quality Report for field_observations.csv:
2 
3MISSING VALUES:
4- temperature: 23 missing (2.7%)
5- salinity: 8 missing (0.9%)
6- depth: 0 missing
7- species_count: 45 missing (5.3%) ← mostly from Site_07
8 
9DATE ISSUES:
10- Rows 1-234: MM/DD/YYYY format
11- Rows 235-847: YYYY-MM-DD format
12- 3 rows have "unknown" as date
13 
14OUTLIERS (>3 SD):
15- temperature: 2 values (89°C, 91°C) ← likely Fahrenheit not Celsius
16- species_count: 1 value (9,847) ← likely typo, check field notes
17 
18RECOMMENDATIONS:
191. Convert temperature outliers: (F-32)*5/9
202. Investigate Site_07 missing species counts
213. Standardize dates to ISO format

Step 2: Clean with audit trail

Bash

1claude "Create a cleaning script that:
21. Fixes the issues you identified
32. Logs every change made (row, column, old value, new value)
43. Saves cleaned data with '_cleaned' suffix
54. Generates a cleaning report I can include in supplementary materials
6 
7Use pandas. Make it reproducible."

Step 3: Generate the figure

Bash

1claude "Create a publication-ready figure showing:
2- Species count by site (x-axis) vs temperature (y-axis)
3- Color by depth category (shallow/medium/deep)
4- Include regression line with 95% CI
5- Error bars showing standard error
6 
7Style requirements:
8- 300 DPI for print
9- Sans-serif fonts (Nature style)
10- Colorblind-friendly palette
11- Panel dimensions: 180mm x 120mm
12 
13Save as both PNG and PDF."

Time Comparison

Task	Traditional	With Claude Code
Data quality assessment	45 min	3 min
Cleaning script	2-3 hours	15 min
Publication figure	1-2 hours	10 min
Total	4-5 hours	~30 min

Case Study 2: Inheriting a Grad Student's Code

Scenario: Your grad student defended and left. Now you need to re-run their analysis for a follow-up paper. The code "worked on their laptop."

The Problem

Code references absolute paths (/Users/alex/Desktop/thesis/...)
Missing dependency versions ("it just needs pandas")
Cryptic error: KeyError: 'treatment_group'
No documentation

The Workflow

Step 1: Forensic analysis

Bash

1claude "Analyze the code in legacy_analysis/:
21. List all Python files and their apparent purpose
32. Identify all dependencies being imported
43. Find all hardcoded paths that need updating
54. Identify where 'treatment_group' should come from
65. Create a dependency list with likely version requirements
7 
8Be thorough—I need to get this running without the original author."

Step 2: Create reproducible environment

Bash

1claude "Based on your analysis:
21. Create a requirements.txt with pinned versions
32. Create a setup script that:
4   - Sets up a virtual environment
5   - Installs dependencies
6   - Creates necessary directories
73. Update all hardcoded paths to use relative paths or config file
84. Add the missing 'treatment_group' column based on the data patterns you see"

Step 3: Document for posterity

Bash

1claude "Create documentation for this project:
21. README.md explaining what the analysis does
32. Add docstrings to all functions
43. Create a flowchart of the analysis pipeline
54. Add inline comments for non-obvious code
6 
7This needs to be understandable by the next grad student."

Case Study 3: Multi-Author Analysis Integration

Scenario: Three co-authors analyzed different aspects of the same dataset. One used R, one used Python, one used Stata. You need to integrate their results.

The Problem

Bash

1Author 1 (R/tidyverse): Descriptive statistics, demographic analysis
2Author 2 (Python/statsmodels): Main regression models
3Author 3 (Stata): Robustness checks, instrumental variables
4 
5All analyzing: survey_data_2024.csv
6Output formats: .rds, .pkl, .dta

The Workflow

Step 1: Standardize outputs

Bash

1claude "I have analysis outputs in three formats:
2- author1_results/ contains .rds files (R)
3- author2_results/ contains .pkl files (Python)
4- author3_results/ contains .dta files (Stata)
5 
6Create a script that:
71. Reads all result files
82. Extracts key statistics (coefficients, SEs, p-values, CIs)
93. Standardizes to a common format
104. Outputs a combined results table
11 
12Handle the different statistical object structures appropriately."

Step 2: Create master results table

Bash

1claude "Using the standardized results, create:
21. A master regression table (Table 2 format) combining all models
32. Export as:
4   - LaTeX (for manuscript)
5   - Word-compatible format (for co-authors who don't use LaTeX)
6   - CSV (for supplementary materials)
7 
8Use APA formatting for statistics."

Step 3: Verify consistency

Bash

1claude "Cross-check the three analyses:
21. Do sample sizes match where they should?
32. Are baseline descriptives consistent?
43. Flag any discrepancies between authors' results
54. Create a verification report I can share with co-authors"

Case Study 4: BCO-DMO Data Documentation

Scenario: NSF requires you to submit your data to BCO-DMO. You need comprehensive metadata documentation.

The Workflow

Bash

1claude "I need to prepare my dataset for BCO-DMO submission.
2 
3Read data/coral_survey_2024.csv and create:
4 
51. A data dictionary with:
6   - Variable names
7   - Data types
8   - Units
9   - Allowed values/ranges
10   - Missing value codes
11   - Collection methods (I'll fill these in)
12 
132. Methods documentation covering:
14   - Sampling design
15   - Data collection protocols
16   - Quality control procedures
17   - Known limitations
18 
193. A README following BCO-DMO guidelines
20 
21Format everything in their required template structure."

Quick Reference: Research Prompts That Work

Data Cleaning

Bash

"Read [file] and identify data quality issues. For each issue, suggest a fix and explain the tradeoff."

Statistical Analysis

Bash

"Run [specific test] on [variables] and interpret the results as I would for a Methods section. Include effect sizes."

Visualization

Bash

"Create a [plot type] suitable for [journal]. Use [specific style guide] formatting. Make it colorblind-accessible."

Code Documentation

Bash

"Add documentation to [file] that would let a new grad student understand and modify it. Include example usage."

What These Cases Have in Common

Real deadlines — Not "exploring data for fun" but "paper submission due Friday"
Messy inputs — Data and code are never as clean as tutorials assume
Audit trails — Academic work requires documentation of every decision
Multiple formats — Journals want LaTeX, co-authors want Word, repos want Markdown
Reproducibility — Someone else needs to run this code in 5 years

Next Steps

Research Limitations & When Not to Use Claude Code — Honest assessment for skeptical PIs
Collaboration Workflows — Working with co-authors and students
First 30 Minutes Exercise — Try it with your own data