Generating random numbers in R and Racket

2019 Apr 05 @travishinkelman.com

R makes it easy to generate random numbers from a wide variety of distributions with a consistent interface. For example, runif, rnorm, and rpois generate random numbers from uniform, normal, and Poisson distributions, respectively.

> x = runif(n = 1000, min = 4.6, max = 9.3)
> min(x)
[1] 4.60374
> max(x)
[1] 9.288063
> 
> y = rnorm(n = 1000, mean = 81, sd = 9)
> mean(y)
[1] 81.2548
> sd(y)
[1] 9.179427
> 
> z = rpois(n = 1000, lambda = 7.11)
> mean(z)
[1] 7.187
> var(z)
[1] 6.954986

The Racket math library provides much of the same functionality as the base R functions, but I first want to work through a couple of examples without using the math library because I think it is useful for better understanding Racket.

Let's create a Racket function that is similar to runif in R.

#lang racket/base

(define (runif n [min 0] [max 1])
  (for/list ([i (in-range n)])
    (+ (* (- max min) (random)) min)))

The random function draws a random number from a uniform distribution between 0 and 1. A list of n random samples is drawn by looping through a sequence with for/list. To draw a random number within a specified interval, we multiply the random number by the specified range (- max min) and shift by the min.

> (define unif (runif 1000 -7 33))
> (apply min unif)  ; check that min is about -7
-6.987705647350004
> (apply max unif)  ; check that max is about 33
32.97250528872971

The need to use apply to calculate the min of a list was a surprise to me. The min function takes any number of real numbers as arguments but not a list. apply uses the contents of the list as the arguments for min.

> (min 3 1 2)
1
> (min '(3 1 2))
min: contract violation
  expected: real?
  given: '(3 1 2)

Another wrinkle is introduced if you want to generate a vector (with for/vector) of random numbers rather than a list.

(define (runif-vec n [min 0] [max 1])
  (for/vector ([i (in-range n)])
    (+ (* (- max min) (random)) min)))

We can't use apply with a vector.

> (define unif-vec (runif-vec 1000 0.5 1.5))
> (apply min unif-vec)
apply: contract violation
  expected: list?
  given: '#(1.3465423772765357 1.0623864994795045 1.3437265664094888 1.1048441535810904 0.7861090566755002 0.6470981723620602 0.5546597487687198 0.552946824350613 0.5251621516034304 1.3568612624022975 1.1217164560959263 1.3334770245391927 0.890117942389224...
  argument position: 2nd
  other arguments...:

StackOverflow provides a solution based on for/fold.

> (for/fold ([m +inf.0]) ([x (in-vector unif-vec)]) (min m x))
0.5014616191163698
> (for/fold ([m -inf.0]) ([x (in-vector unif-vec)]) (max m x))
1.4994848021056595

In for/fold, m is an accumulator that is initialized as positive (+inf.0) or negative infinity (-inf.0) for min and max, respectively. When iterating through unif-vec, m is compared to x and the value of m is updated with the output of min or max.

Random values can also be drawn from a normal distribution using draws from the standard uniform distribution with random and a formula from Rosetta Code

> (define (rnorm n [mean 0] [sd 1])
   (for/list ([i (in-range n)])
    (+ mean (* (sqrt (* -2 (log (random)))) (cos (* 2 pi (random))) sd))))
> (define norm (rnorm 1000 2 0.5))
> (require math)      ; need math library for mean and standard deviation
> (mean norm)         ; check that mean is about 2
1.9797112583083123
> (stddev norm)       ; check that sd is about 0.5
0.5022756727111091

Now, let's simplify our life and use the functions provided by the math library.

(require math) 

(define (rnorm-2 n [mean 0] [sd 1])
  (if (and (real? mean) (real? sd))
      (flnormal-sample (real->double-flonum mean) (real->double-flonum sd) n)
      (error "mean and sd arguments must be real numbers")))

The function, flnormal-sample, requires that the mean and standard deviation are passed as floating-point numbers (and returns flvector). In rnorm-2, we check if the mean and sd arguments are real numbers and, if so, convert them to floating-point numbers (real->double-flonum) before calling flnormal-sample, which is otherwise similar to R's runif function.

> (define norm-2 (rnorm-2 1000 2 0.5))
> (mean norm-2)
2.024086073688603
> (stddev norm-2)
0.5039448247564015

The math library contains similar sample functions for numerous types of distributions, including uniform, Poisson, beta, binomial, gamma, and more.

Finally, let's take a look at the statistics object provided by the math library.

> (define s (update-statistics* empty-statistics norm-2))
> (statistics-mean s)
1.9947322187693184
> (statistics-stddev s)
0.4974799838849746

empty-statistics is an empty statistics object. update-statistics* populates the statistics object by iterating through norm-2 to compute summary statistics, which are extracted with accessor functions, e.g., statistics-min, statistics-mean, statistics-skewness.

The question is why would you use the more verbose statistics object. One reason is that statistics-min and statistics-max are actually less verbose than using for/fold with min and max. Also, if you are computing several summary statistics on a large sequence, then update-statistics* is faster than the other assorted functions.

> (define norm-3 (rnorm-2 10000000 1000 100))
> (time
   (mean norm-3)
   (stddev norm-3)
   (variance norm-3)
   (for/fold ([m +inf.0]) ([x (in-flvector norm-3)]) (min m x))
   (for/fold ([m -inf.0]) ([x (in-flvector norm-3)]) (max m x)))
cpu time: 72618 real time: 108130 gc time: 34828
> (time
   (define s2 (update-statistics* empty-statistics norm-3))
   (statistics-mean s2)
   (statistics-stddev s2)
   (statistics-variance s2)
   (statistics-min s2)
   (statistics-max s2))
cpu time: 12478 real time: 22668 gc time: 5512