Hands-on_Ex04_b

ggplot2
ggiraph
plotly
DT
patchwork
Visual Statistical Analysis
Author

xu xinyi

Modified

May 13, 2025

Learning Outcome

  • ggstatsplot package to create visual graphics with rich statistical information

  • performance package to visualise model diagnostics

  • parameters package to visualise model parameters

Getting Started

Installing and launching R packages

In this exercise, ggstatsplot and tidyverse will be used.

pacman::p_load(ggstatsplot, tidyverse)

Importing data

exam <- read_csv("Exam_data.csv")

One-sample test: gghistostats() method

In the code chunk below, gghistostats() is used to to build an visual of one-sample test on English scores.

set.seed(1234)

gghistostats(
  data = exam,
  x = ENGLISH,
  type = "bayes",
  test.value = 60,
  xlab = "English scores"
)

Unpacking the Bayes Factor

  • A Bayes factor is the ratio of the likelihood of one particular hypothesis to the likelihood of another. It can be interpreted as a measure of the strength of evidence in favor of one theory among two competing theories.

  • That’s because the Bayes factor gives us a way to evaluate the data in favor of a null hypothesis, and to use external information to do so. It tells us what the weight of the evidence is in favor of a given hypothesis.

  • When we are comparing two hypotheses, H1 (the alternate hypothesis) and H0 (the null hypothesis), the Bayes Factor is often written as B10. Null Hypothesis (H0): The true mean of the science scores is equal to the test value of 60. Alternative Hypothesis (H1): The true mean of the science scores is not equal to 60.

  • k log(n)- 2log(L(θ̂)): L(θ̂) represents the likelihood of the model tested, given your data, when evaluated at maximum likelihood values of θ.

How to interpret Bayes Factor

A Bayes Factor can be any positive number. One of the most common interpretations is this one—first proposed by Harold Jeffereys (1961) and slightly modified by Lee and Wagenmakers in 2013:

Statistical Annotations:

  • log_e(BF_01) = 2.12: This is the natural logarithm of the Bayes Factor (BF) comparing the null hypothesis (science scores = 60) to the alternative hypothesis. A Bayes Factor greater than 1 indicates evidence against the null, and the value here suggests that the data provide evidence against the null hypothesis H0 (since log_e(2.12) > 0).
  • Δ_posterior mean = 1.12: This indicates the difference between the sample mean and the test value (60), suggesting the average score is higher than the test value.
  • 95% CI: This confidence interval shows the range of values within which the true mean score lies with 95% probability, according to the posterior distribution.
  • JZS = 0.71: This likely refers to the magnitude of the difference between groups or conditions.

Two-sample mean test: ggbetweenstats()

In the code chunk below, ggbetweenstats() is used to build a visual for two-sample mean test of Maths scores by gender.

ggbetweenstats(
  data = exam,
  x = GENDER, 
  y = MATHS,
  type = "np",
  messages = FALSE
)

Oneway ANOVA Test: ggbetweenstats() method

In the code chunk below, ggbetweenstats is used to build a visual for One-way ANOVA test on English score by race.

ggbetweenstats(
  data = exam,
  x = RACE, 
  y = ENGLISH,
  type = "p",
  mean.ci = TRUE, 
  pairwise.comparisons = TRUE, 
  pairwise.display = "s",
  p.adjust.method = "fdr",
  messages = FALSE
)

  • “ns” → only non-significant
  • “s” → only significant
  • “all” → everything

ggbetweenstats - Summary of tests

## Significant Test of Correlation: ggscatterstats()

In the code chunk below, ggscatterstats() is used to build a visual for Significant Test of Correlation between Maths scores and English scores.

ggscatterstats(
  data = exam,
  x = MATHS,
  y = ENGLISH,
  marginal = FALSE,
  )

Significant Test of Association (Depedence) : ggbarstats() methods

In the code chunk below, the Maths scores is binned into a 4-class variable by using cut().

exam1 <- exam %>% 
  mutate(MATHS_bins = 
           cut(MATHS, 
               breaks = c(0,60,75,85,100))
)

In this code chunk below ggbarstats is used to build a visual for Significant Test of Association

ggbarstats(exam1, 
           x = MATHS_bins, 
           y = GENDER)