Files
STAT6473/HW4.Rmd
2025-12-25 22:47:52 -04:00

187 lines
3.7 KiB
Plaintext
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
---
title: "Assignment 4"
subtitle: "STAT3373"
author: "Isaac Shoebottom"
date: "Oct 16th, 2025"
output:
html_document:
df_print: paged
pdf_document: default
---
```{r message=FALSE, warning=FALSE}
library(tidyverse)
library(knitr)
```
# Question 1
## a)
```{r}
# Create the dataset
data <- tibble(
Farm = factor(1:4),
Fert1 = c(48, 45, 52, 44),
Fert2 = c(55, 50, 58, 49),
Fert3 = c(52, 49, 55, 47)
)
# Convert to long format
long_data <- data %>%
pivot_longer(
cols = starts_with("Fert"),
names_to = "Fertilizer",
values_to = "Yield"
) %>%
mutate(Fertilizer = factor(Fertilizer))
kable(long_data, caption = "Yield Data (Bushels per Acre)")
```
## b)
Model: $$Y_{ij} = \mu + \tau_i + \beta_j + \varepsilon_{ij}$$
```{r}
anova_model <- aov(Yield ~ Fertilizer + Farm, data = long_data)
anova_table <- summary(anova_model)
anova_table
```
Conclusions:
- Fertilizer effect is significant (p \< 0.05)
- Farm (block) effect is also significant
## c)
```{r}
tukey_results <- TukeyHSD(anova_model, "Fertilizer")
tukey_results
```
Results:
- Fertilizer 2 produces the highest yields
- All fertilizer pairs differ significantly
- Ordering of mean yields: Fert 2 \> Fert 3 \> Fert 1
Final Conclusion (alpha = 0.05)
- There is strong statistical evidence that fertilizer type affects yield.
- Blocking by farm was appropriate and reduced error variability.
- Fertilizer 2 is the most effective option based on yield.
# Question 2
## a)
```{r}
drug_data <- data.frame(
patient = factor(rep(1:5, each = 3)),
drug = factor(rep(c("A", "B", "C"), times = 5)),
response_time = c(
12, 10, 15, # Patient 1
14, 11, 16, # Patient 2
10, 8, 13, # Patient 3
13, 10, 14, # Patient 4
11, 9, 14 # Patient 5
)
)
kable(drug_data, caption = "Drug Trial Response Times (seconds)")
```
## b)
Model: $$Y_{ij} = \mu + \tau_i + \beta_j + \varepsilon_{ij}$$
```{r}
anova_model <- aov(response_time ~ drug + patient, data = drug_data)
summary(anova_model)
```
Decision (alpha = 0.05):
- Drug effect is significant
- Patient (block) effect is significant
## c)
Residual Diagnostics
```{r}
par(mfrow = c(1, 2))
plot(anova_model, which = 1) # Residuals vs Fitted
plot(anova_model, which = 2) # Normal Q-Q
par(mfrow = c(1, 1))
```
Formal Tests
```{r}
# Normality of residuals
shapiro.test(residuals(anova_model))
# Homogeneity of variance
bartlett.test(response_time ~ drug, data = drug_data)
```
Results:
- Residuals are approximately normally distributed
- Variances across drug groups are homogeneous
## d)
Multiple Comparisons
```{r}
tukey_results <- TukeyHSD(anova_model, "drug")
tukey_results
```
Results:
- All drug pairs differ significantly
- Ordering of mean response times: Drug B < Drug A < Drug C
## e)
Mean Response Times by Drug
```{r}
drug_data %>%
group_by(drug) %>%
summarise(mean_time = mean(response_time)) %>%
ggplot(aes(x = drug, y = mean_time)) +
geom_col(fill = "steelblue") +
labs(
title = "Mean Response Time by Drug",
x = "Drug",
y = "Mean Response Time (seconds)"
) +
theme_minimal()
```
Boxplot by Drug
```{r}
ggplot(drug_data, aes(x = drug, y = response_time)) +
geom_boxplot(fill = "lightgray") +
labs(
title = "Response Time Distribution by Drug",
x = "Drug",
y = "Response Time (seconds)"
) +
theme_minimal()
```
## f)
Conclusion:
At the 5% significance level, there is strong evidence that drug formulation affects patient response time. Blocking by patient was effective and significantly reduced unexplained variability. Post-hoc analysis using Tukeys HSD showed that all three drugs differ significantly, with Drug B producing the fastest (best) response times, followed by Drug A, and then Drug C.