STAT6473/HW4.Rmd

---
title: "Assignment 4"
subtitle: "STAT3373"
author: "Isaac Shoebottom"
date: "Oct 16th, 2025"
output:
  html_document:
    df_print: paged
  pdf_document: default
---

```{r message=FALSE, warning=FALSE}
library(tidyverse)
library(knitr)
```

# Question 1

## a)

```{r}
# Create the dataset
data <- tibble(
  Farm = factor(1:4),
  Fert1 = c(48, 45, 52, 44),
  Fert2 = c(55, 50, 58, 49),
  Fert3 = c(52, 49, 55, 47)
)

# Convert to long format
long_data <- data %>%
  pivot_longer(
    cols = starts_with("Fert"),
    names_to = "Fertilizer",
    values_to = "Yield"
  ) %>%
  mutate(Fertilizer = factor(Fertilizer))

kable(long_data, caption = "Yield Data (Bushels per Acre)")

```

## b)

Model: $$Y_{ij} = \mu + \tau_i + \beta_j + \varepsilon_{ij}$$

```{r}
anova_model <- aov(Yield ~ Fertilizer + Farm, data = long_data)

anova_table <- summary(anova_model)
anova_table
```

Conclusions:

-   Fertilizer effect is significant (p \< 0.05)

-   Farm (block) effect is also significant

## c)

```{r}
tukey_results <- TukeyHSD(anova_model, "Fertilizer")
tukey_results
```

Results:

- Fertilizer 2 produces the highest yields

- All fertilizer pairs differ significantly

- Ordering of mean yields: Fert 2 \> Fert 3 \> Fert 1

Final Conclusion (alpha = 0.05)

- There is strong statistical evidence that fertilizer type affects yield.

- Blocking by farm was appropriate and reduced error variability.

- Fertilizer 2 is the most effective option based on yield.

# Question 2

## a)
```{r}
drug_data <- data.frame(
  patient = factor(rep(1:5, each = 3)),
  drug = factor(rep(c("A", "B", "C"), times = 5)),
  response_time = c(
    12, 10, 15,  # Patient 1
    14, 11, 16,  # Patient 2
    10, 8, 13,   # Patient 3
    13, 10, 14,  # Patient 4
    11, 9, 14    # Patient 5
  )
)

kable(drug_data, caption = "Drug Trial Response Times (seconds)")
```

## b)

Model: $$Y_{ij} = \mu + \tau_i + \beta_j + \varepsilon_{ij}$$

```{r}
anova_model <- aov(response_time ~ drug + patient, data = drug_data)
summary(anova_model)
```

Decision (alpha = 0.05):

- Drug effect is significant

- Patient (block) effect is significant

## c)
Residual Diagnostics
```{r}
par(mfrow = c(1, 2))
plot(anova_model, which = 1)  # Residuals vs Fitted
plot(anova_model, which = 2)  # Normal Q-Q
par(mfrow = c(1, 1))
```

Formal Tests
```{r}
# Normality of residuals
shapiro.test(residuals(anova_model))

# Homogeneity of variance
bartlett.test(response_time ~ drug, data = drug_data)
```

Results:

- Residuals are approximately normally distributed

- Variances across drug groups are homogeneous

## d)
Multiple Comparisons
```{r}
tukey_results <- TukeyHSD(anova_model, "drug")
tukey_results
```

Results:

- All drug pairs differ significantly

- Ordering of mean response times: Drug B < Drug A < Drug C

## e)
Mean Response Times by Drug
```{r}
drug_data %>%
  group_by(drug) %>%
  summarise(mean_time = mean(response_time)) %>%
  ggplot(aes(x = drug, y = mean_time)) +
  geom_col(fill = "steelblue") +
  labs(
    title = "Mean Response Time by Drug",
    x = "Drug",
    y = "Mean Response Time (seconds)"
  ) +
  theme_minimal()
```


Boxplot by Drug
```{r}
ggplot(drug_data, aes(x = drug, y = response_time)) +
  geom_boxplot(fill = "lightgray") +
  labs(
    title = "Response Time Distribution by Drug",
    x = "Drug",
    y = "Response Time (seconds)"
  ) +
  theme_minimal()
```

## f)
Conclusion:

At the 5% significance level, there is strong evidence that drug formulation affects patient response time. Blocking by patient was effective and significantly reduced unexplained variability. Post-hoc analysis using Tukey’s HSD showed that all three drugs differ significantly, with Drug B producing the fastest (best) response times, followed by Drug A, and then Drug C.