Random Number Generation

Random number generators are at the core of many applied computer science applications. For example, stochastic simulation models, statistical sampling, and learning techniques all rely on random numbers generated by computers. However, computers are famously not random.

Computers allow us to reproduce complex calculations and workflows repeatedly and achieve the same result. So how can a machine which is designed to follow careful instruction generate randomness?

In fact, the random numbers generated by computers are not random at all. They leverage some starting point (a seed) and generate a series of numbers which follow that appear to have no relation to each other. However, if the same seed is used again the computer can generate the same series of numbers!

This is powerful because it means we can both simulate randomness, but also simulate that same randomness again in the future. But how do we generate random numbers and what form do those numbers take?

Sometimes we need to generate random whole numbers. Other times we need to generate percentages as a value between 0 and 1. And sometimes we need to generate random numbers which follow some distribution (say an exponential).

It turns out that as long as we can generate random numbers between 0 and 1, we can transform them to any reasonable distribution via the inverse transform theorem (this is something I might discuss in a future post). However, for now lets focus on random numbers between 0 and 1.

The most common random number generator is called the linear congruential generator or LCG. This class of generators has the following form (where U is one random value between 0 and 1):

$X_{i}=aX_{i-1}+c \mod m$

$U_i=\frac{X_i}{m}$

The values of a, c, and, m are all carefully chosen. There are two critical tests a random number generator must pass to be useful; independence and uniformity. We test that using runs and chi-squared tests, respectively.

An independence test verifies that the next value cannot be predicted by the current value. We say that a random number generator produces independent numbers if the following statement is true. Here A is the number of runs within a random number set.

$\text{Independence:} \frac{A-\frac{2n-1}{3}}{\sqrt{\frac{16n-29}{90}}} < z_{0.975}$

A run is simply continuous increases or decreases in values. For example: 1, 2, 3, 4, 3 has two runs because it counts up to four (the first run) and back down to three (the second run).

Next we test for uniformity. This sorts all values into buckets and counts them. If we had 1000 random numbers between 0 and 1 and we split them into quartiles we would expect (shown as E here) 250 values in each bucket. The O here is the number of observed values in those buckets.

$\text{Uniformity:}\sum\frac{(O-E)^2}{E}<\chi_{0.95,k-1}^2$

We can write a little bit of R to help us verify the performance of one very famous random number generator called the desert island generator. This generator chooses the values of 16807, 0, and 2147483647 for a, c, and m, respectively.


random <- function(n, seed = 123) {
  
  values <- numeric(n)
  m <- 2147483647
  
  for (i in seq(n)) {
    seed <- (16807 * seed) %% m
    values[i] <- seed / m
  }
  
  values
}

is_independent <- function(x, alpha = 0.95) {
  
  a <- head(x, -1)
  b <- tail(x, -1)
  
  change <- a > b
  run_change <- change != dplyr::lag(change, 1, 0)
  runs <- max(cumsum(x) + 1)
  
  n <- length(x)
  e <- (2 * n - 1) / 3
  v <- (16 * n - 29) / 90
  
  stat <- round(abs((runs - e) / (v ^ 0.5)), 2)
  crit <- round(qnorm(1 - (alpha / 2)), 2)
  
  msg <- dplyr::if_else(
    stat < crit,
    "Not Independent (Reject",
    "Independent (Accept"
  )
  
  glue::glue("{msg} H0): Stat = {stat}, Critical = {crit}")
}

is_uniform <- function(x, n = 100, alpha = 0.95) {
  
  e <- ceiling(length(x) / n)
  
  stat <- round(sum(((table(ceiling(x * n)) - e) ^ 2) / e), 2)
  crit <- round(qchisq(1 - alpha, n - 1), 2)
  
  msg <- dplyr::if_else(
    stat < crit,
    "Not Independent (Reject",
    "Independent (Accept"
  )
  
  glue::glue("{msg} H0): Stat = {stat}, Critical = {crit}")
}

x <- random(1e6)

is_independent(x)
# > Independent (Accept H0): Stat = 394.49, Critical = 0.06

is_uniform(x)
# > Independent (Accept H0): Stat = 128.36, Critical = 77.05

This implementation tests and verifies that our generator is random for 1M random values!