Take-home_Ex_feedback

Author

Xu Xnyi

Modified

May 13, 2025

Selecting the work from

Ng Jin Yao

Three good design principles:

  1. Clear obijective and data processing: Data wrangling ensures accuracy and relevance in your visualizations. Addtionally, from the starter, stating out the data we are going to analyze which can make more sense for later visualization.
  • Duplicates using distinct()

  • Data types (e.g., converting Age from chr to dbl)

  • New features like Age_group and Region using mutate() and case_when()

  • Joining with geographic metadata using left_join() after transforming the PA column

  1. Visual diversity:There are three different visualization types tailored to the data’s characteristics.
  • A population pyramid to illustrate age-sex structure (a demographic standard)

  • A bar chart comparing regions by population and sex composition

  • A half-eye + boxplot to examine age distribution patterns by region

  1. More concise labels in the data visualization: In the Population Pyramid Plot Visualisation, “Female/Male” is right above the graph which make the visualization more concise. There is no need to put the “gender” legend again.

Three areas for further improvement:

  1. In this take-home exercise, it is not necessary to use geospatial method to do visualization or data-processing.

  2. Reproducibility and Code Commenting Issue: While your narrative is strong, some code logic is missing or only described in words. Without it, others can’t reproduce the visuals.

  3. Half-eye + Box Plot Visualisation of Age vs Region needs to change to more readable version and be straightforward.

Make-over version of data visualization-Half-eye + Box Plot Visualisation of Age vs Region

Ridgeline plot with inside plot and annotations: it conveys the ridgeline plot is a type of chart that displays the distribution of a numeric variable for several groups.

Plot

pacman::p_load(tidyverse, ggplot2, ggrepel, patchwork, 
               ggthemes,dplyr, xml2, sf, scales) 
sgResData24 <- read_csv("respopagesex2024.csv")
distinct(sgResData24)
# A tibble: 60,424 × 6
   PA         SZ                     Age   Sex       Pop  Time
   <chr>      <chr>                  <chr> <chr>   <dbl> <dbl>
 1 Ang Mo Kio Ang Mo Kio Town Centre 0     Males      10  2024
 2 Ang Mo Kio Ang Mo Kio Town Centre 0     Females    10  2024
 3 Ang Mo Kio Ang Mo Kio Town Centre 1     Males      10  2024
 4 Ang Mo Kio Ang Mo Kio Town Centre 1     Females    10  2024
 5 Ang Mo Kio Ang Mo Kio Town Centre 2     Males      10  2024
 6 Ang Mo Kio Ang Mo Kio Town Centre 2     Females    10  2024
 7 Ang Mo Kio Ang Mo Kio Town Centre 3     Males      10  2024
 8 Ang Mo Kio Ang Mo Kio Town Centre 3     Females    10  2024
 9 Ang Mo Kio Ang Mo Kio Town Centre 4     Males      30  2024
10 Ang Mo Kio Ang Mo Kio Town Centre 4     Females    10  2024
# ℹ 60,414 more rows
sgResData24 <- sgResData24 %>%
  mutate(
    # Convert age to numeric, handle "90_and_Over"
    Age_num = case_when(
      Age == "90_and_Over" ~ 90,
      TRUE ~ suppressWarnings(as.numeric(Age))  # Avoid warnings from "90_and_Over"
    ),
    
    # Group into age bands
    Age_group = case_when(
      Age_num >= 0 & Age_num <= 9   ~ "0-9",
      Age_num >= 10 & Age_num <= 19 ~ "10-19",
      Age_num >= 20 & Age_num <= 29 ~ "20-29",
      Age_num >= 30 & Age_num <= 39 ~ "30-39",
      Age_num >= 40 & Age_num <= 49 ~ "40-49",
      Age_num >= 50 & Age_num <= 59 ~ "50-59",
      Age_num >= 60 & Age_num <= 69 ~ "60-69",
      Age_num >= 70 & Age_num <= 79 ~ "70-79",
      Age_num >= 80 & Age_num <= 89 ~ "80-89",
      Age_num >= 90                ~ "90+",
      TRUE                         ~ NA_character_
    )
  )
# Load the GeoJSON file
geo_data <- st_read("MasterPlan2019PlanningAreaBoundaryNoSea.geojson")
Reading layer `MasterPlan2019PlanningAreaBoundaryNoSea' from data source 
  `C:\xinyi-ux\ISSS608-VAA\Take-home_Exercise\Take-home_Ex01\MasterPlan2019PlanningAreaBoundaryNoSea.geojson' 
  using driver `GeoJSON'
Simple feature collection with 55 features and 2 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: 103.6057 ymin: 1.158699 xmax: 104.0885 ymax: 1.470775
Geodetic CRS:  WGS 84
# Function to parse HTML and extract PLN_AREA_N and REGION_N
extract_info <- function(html_str) {
  doc <- read_html(html_str)
  rows <- xml_find_all(doc, ".//tr")

  # Loop through rows and extract key-value pairs
  data <- lapply(rows, function(row) {
    th <- xml_text(xml_find_first(row, ".//th"))
    td <- xml_text(xml_find_first(row, ".//td"))
    if (!is.na(th) && !is.na(td)) {
      return(setNames(list(td), th))
    } else {
      return(NULL)
    }
  })

  # Combine and extract specific fields
  info <- do.call(c, data)
  list(
    Town = info[["PLN_AREA_N"]],
    Region = info[["REGION_N"]]
  )
}


# Apply the extraction function to each row
info_list <- lapply(geo_data$Description, extract_info)

# Combine results into a data frame
info_df <- bind_rows(info_list) %>% distinct() %>% arrange(Region, Town)

# View result
print(info_df)
# A tibble: 55 × 2
   Town          Region        
   <chr>         <chr>         
 1 BISHAN        CENTRAL REGION
 2 BUKIT MERAH   CENTRAL REGION
 3 BUKIT TIMAH   CENTRAL REGION
 4 DOWNTOWN CORE CENTRAL REGION
 5 GEYLANG       CENTRAL REGION
 6 KALLANG       CENTRAL REGION
 7 MARINA EAST   CENTRAL REGION
 8 MARINA SOUTH  CENTRAL REGION
 9 MARINE PARADE CENTRAL REGION
10 MUSEUM        CENTRAL REGION
# ℹ 45 more rows
sgResData24 %>% mutate(PA = toupper(PA))
# A tibble: 60,424 × 8
   PA         SZ                     Age   Sex       Pop  Time Age_num Age_group
   <chr>      <chr>                  <chr> <chr>   <dbl> <dbl>   <dbl> <chr>    
 1 ANG MO KIO Ang Mo Kio Town Centre 0     Males      10  2024       0 0-9      
 2 ANG MO KIO Ang Mo Kio Town Centre 0     Females    10  2024       0 0-9      
 3 ANG MO KIO Ang Mo Kio Town Centre 1     Males      10  2024       1 0-9      
 4 ANG MO KIO Ang Mo Kio Town Centre 1     Females    10  2024       1 0-9      
 5 ANG MO KIO Ang Mo Kio Town Centre 2     Males      10  2024       2 0-9      
 6 ANG MO KIO Ang Mo Kio Town Centre 2     Females    10  2024       2 0-9      
 7 ANG MO KIO Ang Mo Kio Town Centre 3     Males      10  2024       3 0-9      
 8 ANG MO KIO Ang Mo Kio Town Centre 3     Females    10  2024       3 0-9      
 9 ANG MO KIO Ang Mo Kio Town Centre 4     Males      30  2024       4 0-9      
10 ANG MO KIO Ang Mo Kio Town Centre 4     Females    10  2024       4 0-9      
# ℹ 60,414 more rows
library(dplyr)

# Rename Town to PA in the region info dataframe
region_info <- info_df %>% rename(PA = Town)

# left join sgResData24 with region_info to get Region column
sgResData24 <- sgResData24 %>% mutate(PA = toupper(PA)) %>%
  left_join(region_info, by = "PA")

print(sgResData24)
# A tibble: 60,424 × 9
   PA         SZ                Age   Sex     Pop  Time Age_num Age_group Region
   <chr>      <chr>             <chr> <chr> <dbl> <dbl>   <dbl> <chr>     <chr> 
 1 ANG MO KIO Ang Mo Kio Town … 0     Males    10  2024       0 0-9       NORTH…
 2 ANG MO KIO Ang Mo Kio Town … 0     Fema…    10  2024       0 0-9       NORTH…
 3 ANG MO KIO Ang Mo Kio Town … 1     Males    10  2024       1 0-9       NORTH…
 4 ANG MO KIO Ang Mo Kio Town … 1     Fema…    10  2024       1 0-9       NORTH…
 5 ANG MO KIO Ang Mo Kio Town … 2     Males    10  2024       2 0-9       NORTH…
 6 ANG MO KIO Ang Mo Kio Town … 2     Fema…    10  2024       2 0-9       NORTH…
 7 ANG MO KIO Ang Mo Kio Town … 3     Males    10  2024       3 0-9       NORTH…
 8 ANG MO KIO Ang Mo Kio Town … 3     Fema…    10  2024       3 0-9       NORTH…
 9 ANG MO KIO Ang Mo Kio Town … 4     Males    30  2024       4 0-9       NORTH…
10 ANG MO KIO Ang Mo Kio Town … 4     Fema…    10  2024       4 0-9       NORTH…
# ℹ 60,414 more rows

Ridgeline plot: Used to show distribution across groups.

stat_halfeye() is used for density plots

stat_summary() for showing medians

annotate() adds static text annotations

scale() functions customize scales and colors, including a manual color scale using MetBrewer::met.brewer()

coord_flip() flips the axes to change the plot orientation

  • Legend Construction (p_legend)

    • We use a subset of data (rent_title_words) filtered for the word beautiful

    • And geom_curve to draw arrows pointing to specific elements

  • Inserting the Legend into the Main Plot

    • The inset_element function combines the main plot (p) and the legend (p_legend) by embedding the legend within the main plot’s space
glimpse(sgResData24)
Rows: 60,424
Columns: 9
$ PA        <chr> "ANG MO KIO", "ANG MO KIO", "ANG MO KIO", "ANG MO KIO", "ANG…
$ SZ        <chr> "Ang Mo Kio Town Centre", "Ang Mo Kio Town Centre", "Ang Mo …
$ Age       <chr> "0", "0", "1", "1", "2", "2", "3", "3", "4", "4", "5", "5", …
$ Sex       <chr> "Males", "Females", "Males", "Females", "Males", "Females", …
$ Pop       <dbl> 10, 10, 10, 10, 10, 10, 10, 10, 30, 10, 20, 10, 20, 30, 30, …
$ Time      <dbl> 2024, 2024, 2024, 2024, 2024, 2024, 2024, 2024, 2024, 2024, …
$ Age_num   <dbl> 0, 0, 1, 1, 2, 2, 3, 3, 4, 4, 5, 5, 6, 6, 7, 7, 8, 8, 9, 9, …
$ Age_group <chr> "0-9", "0-9", "0-9", "0-9", "0-9", "0-9", "0-9", "0-9", "0-9…
$ Region    <chr> "NORTH-EAST REGION", "NORTH-EAST REGION", "NORTH-EAST REGION…
library(ggdist)
library(ggtext)
library(extrafont) 
font_import()
Importing fonts may take a few minutes, depending on the number of fonts and the speed of the system.
Continue? [y/n] 
# Compute weighted mean for each region
mean_age <- sgResData24 %>%
  group_by(Region) %>%
  summarise(weighted_mean = weighted.mean(Age_num, Pop, na.rm = TRUE))

# Theme and background setup
bg_color <- "grey97"
font_family <- "Fira Sans"

plot_subtitle <- glue::glue("Age distribution by Singapore regions in 2024.\nRed diamonds show the population-weighted mean age.")

# Main plot
p <- ggplot(sgResData24, aes(x = Region, y = Age_num, weight = Pop)) +
  stat_halfeye(fill_type = "segments", alpha = 0.3) +
  stat_interval() +
  stat_summary(geom = "point", fun = median, color = "black") +
  geom_point(data = mean_age, aes(x = Region, y = weighted_mean), 
             color = "red", size = 3, shape = 18, inherit.aes = FALSE) +
  scale_x_discrete(labels = stringr::str_to_title) +
  scale_y_continuous(limits = c(0, 100), breaks = seq(0, 100, 10)) +
  coord_flip() +
  labs(
    title = toupper("AGE DISTRIBUTION BY REGION"),
    subtitle = plot_subtitle,
    caption = "Data: Singapore Department of Statistics, 2024",
    x = NULL,
    y = "Age"
  ) +
  theme_minimal(base_family = font_family) +
  theme(
    plot.background = element_rect(color = NA, fill = bg_color),
    panel.grid = element_blank(),
    panel.grid.major.x = element_line(linewidth = 0.1, color = "grey75"),
    plot.title = element_text(family = "Serif"),
    plot.subtitle = ggtext::element_textbox_simple(margin = margin(t = 4, b = 16), size = 10),
    plot.caption = ggtext::element_textbox_simple(margin = margin(t = 12), size = 8),
    axis.text.y = element_text(hjust = 0, margin = margin(r = -10), family = "Serif"),
    plot.margin = margin(4, 4, 4, 4)
  )
p

  • Central Region has the oldest average age.

  • West and North-East Regions have younger populations on average.

  • The spread (width of intervals) is relatively similar, showing a broad mix of age groups in all regions.