R for Data Analysis - Introduction
Learn R programming for data analysis with hands-on examples and Claude Code assistance
120 minutes
7 min read
R for Data Analysis - Introduction
Learn R programming for data analysis! This hands-on tutorial will teach you R fundamentals through practical examples with Claude Code as your coding partner.
What You'll Learn
- R programming basics (variables, functions, data structures)
- Data manipulation with dplyr and tidyverse
- Data visualization with ggplot2
- Statistical analysis fundamentals
- Working with real datasets
- How to effectively use Claude Code for R development
Prerequisites
Before starting:
- Complete macOS Setup or Windows Setup
- R and RStudio installed
- Basic understanding of programming concepts (helpful but not required)
Time Required
~2 hours to complete all sections
1. Setting Up Your R Environment
Installing R and RStudio
macOS:
Bash
# Install Rbrew install r # Install RStudio (download from RStudio.com or use Homebrew)brew install --cask rstudioWindows:
Bash
# Download and install from:# R: https://cran.r-project.org/bin/windows/base/# RStudio: https://posit.co/download/rstudio-desktop/Installing Essential Packages
Open RStudio and run:
R
# Install tidyverse (includes dplyr, ggplot2, and more)install.packages("tidyverse") # Install additional useful packagesinstall.packages(c("readr", "lubridate", "stringr")) # Load tidyverselibrary(tidyverse)VS Code Setup for R
Install the R extension in VS Code:
Bash
# In VS Code, install the R extension# Press Cmd+Shift+X (Mac) or Ctrl+Shift+X (Windows)# Search for "R" by REditorSupport and install2. R Basics
Variables and Data Types
Create a new R script:
R
# Numericage <- 25height <- 5.9temperature <- 98.6 # Character (strings)name <- "Alice"city <- "San Francisco" # Logical (boolean)is_student <- TRUEhas_experience <- FALSE # Print valuesprint(paste("Name:", name))print(paste("Age:", age))Ask Claude Code:
Bash
Explain the difference between <- and = in R.When should I use each one?Vectors
R
# Numeric vectortemperatures <- c(72, 75, 68, 71, 73, 76, 74)print(temperatures) # Character vectordays <- c("Mon", "Tue", "Wed", "Thu", "Fri", "Sat", "Sun")print(days) # Vector operationsmean_temp <- mean(temperatures)max_temp <- max(temperatures) print(paste("Average temperature:", mean_temp))print(paste("Maximum temperature:", max_temp)) # Indexing (R uses 1-based indexing!)first_day <- days[1] # "Mon"weekend <- days[6:7] # "Sat", "Sun"Data Frames
R
# Create a data frameweather_data <- data.frame( day = days, temperature = temperatures, sunny = c(TRUE, TRUE, FALSE, TRUE, TRUE, TRUE, FALSE)) # View the dataprint(weather_data)head(weather_data) # First 6 rowsstr(weather_data) # Structure # Access columnsweather_data$temperatureweather_data$day # Filter rowssunny_days <- weather_data[weather_data$sunny == TRUE, ]print(sunny_days)Ask Claude Code:
Bash
Create a data frame with student information including:- names (5 students)- ages- test scores- pass/fail status Then calculate the average score and find students who passed.3. Data Manipulation with dplyr
Loading and Exploring Data
R
library(tidyverse) # Create sample datasetsales_data <- tibble( date = seq(as.Date("2025-01-01"), as.Date("2025-01-31"), by = "day"), product = sample(c("Laptop", "Phone", "Tablet"), 31, replace = TRUE), quantity = sample(1:10, 31, replace = TRUE), price = sample(c(999, 699, 399), 31, replace = TRUE), region = sample(c("West", "East", "North", "South"), 31, replace = TRUE)) # View the dataglimpse(sales_data)head(sales_data, 10)The Pipe Operator
R
# Traditional way (hard to read)result1 <- filter(sales_data, product == "Laptop")result2 <- select(result1, date, quantity, price)result3 <- arrange(result2, desc(quantity)) # With pipe operator (easy to read)result <- sales_data %>% filter(product == "Laptop") %>% select(date, quantity, price) %>% arrange(desc(quantity))Filtering with filter()
R
# Filter laptopslaptops <- sales_data %>% filter(product == "Laptop") # Multiple conditionshigh_value_sales <- sales_data %>% filter(product == "Laptop", quantity > 5) # Filter by date rangejanuary_first_week <- sales_data %>% filter(date <= as.Date("2025-01-07")) print(high_value_sales)Selecting Columns with select()
R
# Select specific columnssales_summary <- sales_data %>% select(date, product, quantity) # Select range of columnssales_details <- sales_data %>% select(product:price) # Drop columnswithout_region <- sales_data %>% select(-region) head(sales_summary)Creating New Columns with mutate()
R
# Add calculated columnsales_with_total <- sales_data %>% mutate(total_revenue = quantity * price) # Multiple new columnssales_enhanced <- sales_data %>% mutate( total_revenue = quantity * price, weekday = weekdays(date), month = month(date) ) head(sales_enhanced)Grouping and Summarizing
R
# Summary statistics by productproduct_summary <- sales_data %>% mutate(revenue = quantity * price) %>% group_by(product) %>% summarize( total_sales = n(), total_quantity = sum(quantity), total_revenue = sum(revenue), avg_quantity = mean(quantity) ) print(product_summary) # Multiple groupingregion_product_summary <- sales_data %>% mutate(revenue = quantity * price) %>% group_by(region, product) %>% summarize( total_revenue = sum(revenue), avg_price = mean(price) ) print(region_product_summary)Ask Claude Code:
Bash
Using the sales_data:1. Find the total revenue per region2. Identify the top 5 highest revenue days3. Calculate average quantity sold per product per region4. Data Visualization with ggplot2
Understanding ggplot2 Syntax
R
library(ggplot2) # Basic structure:# ggplot(data, aes(x = ..., y = ...)) + geom_*() # Scatter plotggplot(sales_data, aes(x = date, y = quantity)) + geom_point() + labs(title = "Sales Quantity Over Time", x = "Date", y = "Quantity Sold") # Add color by productggplot(sales_data, aes(x = date, y = quantity, color = product)) + geom_point(size = 3) + labs(title = "Sales by Product", x = "Date", y = "Quantity")Bar Charts
R
# Revenue by productsales_data %>% mutate(revenue = quantity * price) %>% group_by(product) %>% summarize(total_revenue = sum(revenue)) %>% ggplot(aes(x = product, y = total_revenue, fill = product)) + geom_col() + labs(title = "Total Revenue by Product", x = "Product", y = "Revenue ($)") + theme_minimal()Line Charts
R
# Daily revenue trendsales_data %>% mutate(revenue = quantity * price) %>% group_by(date) %>% summarize(daily_revenue = sum(revenue)) %>% ggplot(aes(x = date, y = daily_revenue)) + geom_line(color = "blue", size = 1) + geom_point(color = "darkblue", size = 2) + labs(title = "Daily Revenue Trend") + theme_minimal()Box Plots
R
# Quantity by productggplot(sales_data, aes(x = product, y = quantity, fill = product)) + geom_boxplot() + labs(title = "Quantity Distribution by Product") + theme_minimal()Ask Claude Code:
Bash
Create visualizations showing:1. A stacked bar chart of revenue by region2. A line chart showing cumulative revenue3. A faceted plot for each product5. Reading and Writing Data
CSV Files
R
# Write to CSVwrite_csv(sales_data, "sales_january_2025.csv") # Read from CSVsales_imported <- read_csv("sales_january_2025.csv")Excel Files
R
# Install packages if neededinstall.packages("readxl")install.packages("writexl") library(readxl)library(writexl) # Write to Excelwrite_xlsx(sales_data, "sales_january_2025.xlsx") # Read from Excelexcel_data <- read_excel("sales_january_2025.xlsx")6. Next Steps
Practice Projects
- Customer Analysis - Segment customers and analyze behavior
- Time Series - Forecast sales trends
- Statistical Modeling - Build predictive models
Advanced Topics
- Shiny: Build interactive web apps
- tidymodels: Machine learning in R
- RMarkdown: Automated reporting
Continue Learning
- Automation Track: Automate R reports
- App Builder Track: Build Shiny dashboards
- Git & GitHub: Version control
Resources
Congratulations! You now have a solid foundation in R for data analysis. Keep practicing and use Claude Code to help you learn!