Advent of Code 2023: Day Three

Advent of Code is an annual series of holiday themed programming challenges which can be completed in any language. There is one problem for each day in December before Christmas!

Day three asks us to consider an engine schematic which needs to be parsed. If a symbol has a number adjacent to it (could be horizontal, vertical, or diagonal) we need to keep it. We sum up all numbers at the end to identify the missing part number. You can see the original problem here.

In the text below, we can see symbols (any value that is not a period or a number). The first star on the second row means that the number 467 is considered but the number 114 is not considered since no part of the number is directly adjacent.

467..114..
...*......
..35..633.
......#...
617*......
.....+.58.
..592.....
......755.
...$.*....
.664.598..

I decided to use R and the tidyverse to solve this problem. The real schematic is much larger than the example shown above. First, I load the text and parse the lines into a useful tibble format.

#### Setup ####

library(tidyr)
library(dplyr)
library(readr)
library(tibble)

lines <- readr::read_lines("~/Documents/data.txt")

digits <- seq(0, 9)
non_symbols <- c(".", digits)

#### Parse Schematic ####

parsed_lines <-
  lines |>
  stringr::str_split("") |>
  purrr::imap(function(value, row) {
    
    tibble::tibble(
      row,
      col = seq_along(value),
      value
    )
    
  }) |>
  dplyr::bind_rows()
  
# A tibble: 19,600 × 3
#   row   col value
#   <int> <int> <chr>
# 1     1     1  .    
# 2     1     2  .    
# 3     1     3  .    
# 4     1     4  .    
# 5     1     5  5    
# 6     1     6  7    
# 7     1     7  3    
# 8     1     8  .    
# 9     1     9  6    
# 10    1    10  1    
# ℹ 19,590 more rows

Next, I parse this data frame into two pieces, symbols and numbers. I want to determine the row-column positions of each number as an instance instead of individual characters. I use the lag function to determine the extent of a number and store these for later use.

number_df <-
  parsed_lines |>
  dplyr::mutate(
    is_num = value %in% digits,
    num_instance = cumsum(is_num != lag(is_num, 1, FALSE))
  ) |>
  dplyr::filter(
    value %in% digits
  ) |>
  dplyr::group_by(num_instance) |>
  dplyr::mutate(
    number = glue::glue_collapse(value)
  ) |>
  dplyr::ungroup() |>
  dplyr::select(
    row,
    col,
    number,
    num_instance
  )

I also parse the symbols. I first identify any character which is not a number or a period. Next, I expand to a series of nudges or adjustments to the row-column pair in all directions where a potential number could be matched.

nudges <-
  tidyr::expand_grid(
    col_nudge = seq(-1, 1),
    row_nudge = seq(-1, 1)
  ) |>
  dplyr::filter(
    !(col_nudge == 0 & row_nudge == 0)
  )

symbol_df <-
  parsed_lines |>
  dplyr::filter(
    !value %in% non_symbols
  ) |>
  dplyr::mutate(
    sym_instance = row_number()
  ) |>
  tidyr::expand_grid(nudges) |>
  dplyr::mutate(
    row = row + row_nudge,
    col = col + col_nudge
  )

Finally, I join these together an ensure that only one instance of a number is present for each symbol (so numbers which are both horizontally and vertically above a symbol are not double counted, for example). This allows me to take the simple sun and find the answer which is 560670! On my machine this implementation takes around 150ms which is fairly quick!

symbol_df |>
  dplyr::inner_join(
    y = number_df,
    by = c("row", "col")
  ) |>
  dplyr::distinct(
    number,
    sym_instance,
    num_instance
  ) |>
  dplyr::pull(number) |>
  as.numeric() |>
  sum()