Tutorial 4: Random number generators
Learning goals
In this tutorial you will learn to:
- use uniform random number generator
- use binomial random number generator
- calculate probabilities from binomial distribution
- plot probability distribution functions
Random number generators
Simulating randomness with a computer is not a simple task. Randomness is contrary to the nature of a computer, which is designed to perform operations exactly. However, there are algorithms that produce a string of numbers that are for all intents and purposes random: there is no obvious connection between one number and the next, and the values don’t form any pattern. Such algorithms are called random number generators, although to be more precise they produce pseudo-random numbers. The reason is that they actually produce a perfectly predictable string of numbers, which eventually repeats itself, but with a humongous period. One can even produce the same random number, or the same string of random numbers, by specifying the seed for the random number generator. This is very useful if one wants to reproduce the results of a code that uses random numbers.
uniform discrete random numbers
Of course, random variable are not all the same - they have different distributions. R has a number of functions for producing random numbers from different distributions. To produce random numbers from a set of values with a uniform probability distribution, use the function sample()
. The following command produces a random integer between 1 and 20. Repeating the same command produces a new random number, which (most likely) is not the same as the first. The first input argument (1:20) is the vector of values from which to draw the random number, and the second is the size of the sample.
If the random seed is set to some value (e.g. 100 below), then the random number generator will produce the same “random” number if this command is repeated:
To generate 10 randomly chosen integers between 1 and 20, see the following two commands, which differ in setting the value of the option replace. The first command doesn’t specify the value for replace, and by default it is set to FALSE, so the command draws numbers without replacing them (meaning that all the numbers in the sample are unique). In the second command replace is set to TRUE, so the numbers that were selected can be chosen again. In both cases, repeatedly running the command results in a different set of randomly chosen numbers, which you should investigate by copying the commands into R and running them yourself.
table()
is a very useful function for tallying how many values of each type are in a vector. It essentially provides a kind of histogram, with each distinct value in the data set a distinct bin, so it may be used together with barplot()
to plot a detailed histogram:
binomial random number generator
If you need to generate random numbers from the binomial distribution, R has you covered. The command is rbinom(s, n, p)
and it requires three input values: s is the number of observations (sample size), n is the number of binary trials in one observation, and p is the probability of success in one binary trial. The following two commands generate a single random number, the number of successes out of 20 trials with probability of success 0.2 and 0.6:
To generate a sample of ten random numbers, change the first input parameter to 10. As you’d expect, the samples of 10 observations are (most likely) noticeably different: when the probability p is 0.2, the number of successes tend to be less than 6, while for probability 0.6, the numbers are usually greater than 10.
Notice that the range of possible values of this random variable is between 0 and 20, but unlike the uniform random numbers produced with the sample()
function, the probabilities of obtaining different numbers are different and depend on the parameter p. Calculation and plotting of the binomial distribution function can be accomplishes with the command dbinom(x,n,p)
, where \(x\) is the value of the random variable (between 0 and n), \(n\) is the number of trials, and p is the probability of success. For instance, the following script calculate the probability of 4 successes out of 20 trials with probability \(p=0.2\):
The script below calculates the probabilities of all of the possible values of the random variable by substituting the vector of these values (e.g. 0 to 20) instead of the number 1, generating the probability distribution vector. This vector is plotted vs. the values of the random variable using the barplot()
function, producing an aesthetically pleasing plot of the binomial distribution. The script plots two binomial probability distributions, both with \(n=20\), the first with \(p=0.2\) and the second with \(p=0.6\). To add values to the x axis one needs the option names.arg
assigned the vector of values, the axis labels in barplot
use the same options we saw before (xlab
and ylab
), and the main option produces a title above each plot.
Exercises
The following exercises ask you to perform computational tasks using R. Type your code in the box below and try to do the task on your own before clicking on the Answer or Hint boxes to expand them.
- Use the
sample()
function to generate a sample of 10 random numbers out of 20 integers (from 1 to 20) without replacement, assign it to variablesample_vec
and print it out.
- Use the
sample()
function to generate a sample of 20 random numbers of out 20 integers (from 1 to 20) with replacement, assign it to variablesample_vec
and print it out.
- Use the
rbinom()
function to generate a sample of 20 random numbers of successes/wins out of 12 trials with probability of success in one trial of 0.4, assign it to variablebinom_vec
and print it out.
- Use the
rbinom()
function to generate a sample of 12 random numbers of successes/wins out of 20 trials, with probability of success in one trial of 0.4, assign it to variablebinom_vec
and print it out.