## Stay or Switch? A Frequentist Approach to Monte Hall in R

The Monte Hall is a famous brain teaser in the form of a probability puzzle. For the uninitiated, the problem is as follows: Consider three doors where two of the doors have nothing behind them and one of the doors has a prize. You are instructed to select the door with the prize.

The game show host opens a door that is different than the door you chose that does not contain the prize. You now know that the prize is either behind the door you originally chose or the door that the game show host did not open. Should you stay at your current door, or switch to the unopened door?

Initially, most people would say that it shouldn’t matter. After all, you chose the first door randomly. As it turns out, you have a 1-in-3 chance of being correct if you stay and a 2-in-3 chance of being correct if you switch! This seems impossible. The frequentist in me knows that we can simulate this using a scripting language!

Our simulation will randomly select two integers between 1 and 3; the first integer represents the contestant’s first choice whereas the second integer represents the correct answer (the door with the prize). We also denote the simulation ID in the first column.

Next we randomly determine which door will be revealed to the contestant. If the correct answer and the contestant’s choice are different, then the door that is revealed is whatever door is left (i.e., if the answer is door 1 and the choice is door 2 then door 3 is revealed).

If the correct answer and the contestant’s choice are the same, then the revealed door is chosen at random between the remaining doors (i.e., if the contestant chose door 1 and the answer is door 1, then either door 2 or 3 is revealed).

Finally, we need to determine which door is the switch door. That is, whatever door has not already been chosen and not revealed. If door 1 was originally chosen and door 2 was revealed, then door 3 is the switch door (regardless of the correct answer)

Now we can use R and the tidyverse to construct a function which will simulate Monte Hall’s problem many times. Because vectorized operations in R are fast, we will stay within that constraint for our function. Additionally, this solution leverages the row ID to prevent grouping the data frame into n groups providing further performance benefits!

``````simulate <- function(n = 1e6) {
# Simulation of the Monte Hall problem

values <- seq(3)
reveal <- values
switch <- values

tibble::tibble(
row_id = seq(n),
answer = sample(values, n, replace = TRUE),
choice = sample(values, n, replace = TRUE)
) |>
tidyr::expand_grid(reveal) |>
dplyr::filter(
reveal != choice
) |>
dplyr::distinct(
row_id,
choice,
.keep_all = TRUE
) |>
tidyr::expand_grid(switch) |>
dplyr::filter(
switch != choice,
switch != reveal
) |>
dplyr::summarize(
It then finds out which door to reveal by looking for the door not already chosen by the answer or choice. If the answer and choice are the same, then `distinct` ensures there is only one revealed door for each combination.