--- title: "Assignment 4" subtitle: "STAT3373" author: "Isaac Shoebottom" date: "Oct 16th, 2025" output: html_document: df_print: paged pdf_document: default --- ```{r message=FALSE, warning=FALSE} library(tidyverse) library(knitr) ``` # Question 1 ## a) ```{r} # Create the dataset data <- tibble( Farm = factor(1:4), Fert1 = c(48, 45, 52, 44), Fert2 = c(55, 50, 58, 49), Fert3 = c(52, 49, 55, 47) ) # Convert to long format long_data <- data %>% pivot_longer( cols = starts_with("Fert"), names_to = "Fertilizer", values_to = "Yield" ) %>% mutate(Fertilizer = factor(Fertilizer)) kable(long_data, caption = "Yield Data (Bushels per Acre)") ``` ## b) Model: $$Y_{ij} = \mu + \tau_i + \beta_j + \varepsilon_{ij}$$ ```{r} anova_model <- aov(Yield ~ Fertilizer + Farm, data = long_data) anova_table <- summary(anova_model) anova_table ``` Conclusions: - Fertilizer effect is significant (p \< 0.05) - Farm (block) effect is also significant ## c) ```{r} tukey_results <- TukeyHSD(anova_model, "Fertilizer") tukey_results ``` Results: - Fertilizer 2 produces the highest yields - All fertilizer pairs differ significantly - Ordering of mean yields: Fert 2 \> Fert 3 \> Fert 1 Final Conclusion (alpha = 0.05) - There is strong statistical evidence that fertilizer type affects yield. - Blocking by farm was appropriate and reduced error variability. - Fertilizer 2 is the most effective option based on yield. # Question 2 ## a) ```{r} drug_data <- data.frame( patient = factor(rep(1:5, each = 3)), drug = factor(rep(c("A", "B", "C"), times = 5)), response_time = c( 12, 10, 15, # Patient 1 14, 11, 16, # Patient 2 10, 8, 13, # Patient 3 13, 10, 14, # Patient 4 11, 9, 14 # Patient 5 ) ) kable(drug_data, caption = "Drug Trial Response Times (seconds)") ``` ## b) Model: $$Y_{ij} = \mu + \tau_i + \beta_j + \varepsilon_{ij}$$ ```{r} anova_model <- aov(response_time ~ drug + patient, data = drug_data) summary(anova_model) ``` Decision (alpha = 0.05): - Drug effect is significant - Patient (block) effect is significant ## c) Residual Diagnostics ```{r} par(mfrow = c(1, 2)) plot(anova_model, which = 1) # Residuals vs Fitted plot(anova_model, which = 2) # Normal Q-Q par(mfrow = c(1, 1)) ``` Formal Tests ```{r} # Normality of residuals shapiro.test(residuals(anova_model)) # Homogeneity of variance bartlett.test(response_time ~ drug, data = drug_data) ``` Results: - Residuals are approximately normally distributed - Variances across drug groups are homogeneous ## d) Multiple Comparisons ```{r} tukey_results <- TukeyHSD(anova_model, "drug") tukey_results ``` Results: - All drug pairs differ significantly - Ordering of mean response times: Drug B < Drug A < Drug C ## e) Mean Response Times by Drug ```{r} drug_data %>% group_by(drug) %>% summarise(mean_time = mean(response_time)) %>% ggplot(aes(x = drug, y = mean_time)) + geom_col(fill = "steelblue") + labs( title = "Mean Response Time by Drug", x = "Drug", y = "Mean Response Time (seconds)" ) + theme_minimal() ``` Boxplot by Drug ```{r} ggplot(drug_data, aes(x = drug, y = response_time)) + geom_boxplot(fill = "lightgray") + labs( title = "Response Time Distribution by Drug", x = "Drug", y = "Response Time (seconds)" ) + theme_minimal() ``` ## f) Conclusion: At the 5% significance level, there is strong evidence that drug formulation affects patient response time. Blocking by patient was effective and significantly reduced unexplained variability. Post-hoc analysis using Tukey’s HSD showed that all three drugs differ significantly, with Drug B producing the fastest (best) response times, followed by Drug A, and then Drug C.