Randomness is a simple concept and one that is pervasive in a number of aspects of your life. How you use your phone and electronics safely through the wonders of cryptography, how your transport and event tickets are registered through (semi) random bar codes, randomness and the ability to create a list of random numbers is a highly sought after ability.
Before we continue, let’s first lay down a concrete definition of randomness. When I was in secondary school, my mathematics teacher described randomness to me as “every member of a set having an equal chance of selection”. I found this definition very useful. For example, if you’re rolling a dice, each member of the set is each number that can be rolled (usually 1-6). If the dice is random, there should be an equal chance of each of the numbers being rolled. At a rudimentary level certainly, this is a fairly good definition.
There are a couple of problems however. Say, for example, you need to select 100 students from a list of 1,000. To fulfil our aforementioned criteria of randomly selecting these students, each student needs an equal chance of being selected. But if we rolled a 10 sided dice, used the rolled number as our start point and then selected the 10th person after each selected (e.g. 5th person, then the 15th, then the 25th, etc.), does that count as random? Whilst if fulfils our definition in that every student has the same chance of being selected (1 in 10) before the sample is taken, after the first student is chosen, each student does not have an equal chance of selection. Instead, once the first student has been chosen, the rest of the sample has also been selected, and so the remaining students do not have an equal chance of being selected. As a result, it’s clear that our described method does not produce a random sample, and so we need to change our definition.
So to better our definition, we’ll add another clause – “Random (selection) is every member of a set having an equal chance of being selected, at each level of selection”. This means now that after every selection from a set, the remaining members must still have an equal chance of being selected. Going back to our student example, when we select our first student, the chance of each student being selected must be 100 in 1,000. After the first student has been selected, the chance of any student being chosen must be 99 in 999. Then 98 in 998, and so on and so forth. Using this definition, we can better describe what randomness truly is.
But how do you select your sample? How do you decide which 100 of your students you’re going to select? We know that each student needs to have the same chance of being selected, but how do ensure that this happens? The most common way is to use a random number generator. Each student is assigned a number, and then a random number between 1 and 1000 is selected, with the corresponding student for the number chosen being selected for the sample.
But this method asks more of its own questions. For example, how do we know that the random numbers are random? To demonstrate this, let’s move away from our student example and go back to rolling dice. Let’s say we want to have a game of Monopoly, but we can’t find any dice. We rummage through a number of board games until eventually we find a worn and battered die. We use this and continue our merry game. But, after the first few throws, 6 is the only number that has been thrown. This immediately makes us ask the question, is the dice we’re using fair? Is it producing a random number between 1 and 6?
Ultimately, the answer is that we don’t know. For our dice to be bias, at least one number must have a higher chance of selection than the others. But how do we know this is the case? Is five 6s in a row sufficient to say that the dice is bias towards rolling a 6?
To examine this, let’s have a look at some probabilities.
At each level (assuming that the dice is random), each number has a 1 in 6 chance of being selected. To find the probability of two consecutive events occurring, we multiple the probability of one by the other. So, the probability of rolling one 6 is 1/6th, but the probability of rolling two 6s in a row is 1/6th x 1/6th = 1/36th. Therefore, the probability of rolling five 6s in a row is 1/6th x 1/6th x 1/6th x 1/6th x 1/6th = 1/14,256 or 0.000007015! Without further thought, this may seem like convincing evidence that the die is bias.
But when we look at the probability of rolling any string of numbers, our previously convincing evidence appears to wither beneath the light. For example, let’s imagine that instead of five 6s, our rolling streak was: 3, 5, 6, 2, 3. This streak seems fairly innocuous and I would hazard to guess that if such a roll occurred, very few people would jump to say that the die is not random. But when we examine the probability of this streak occurring we get… 1/6th x 1/6th x 1/6th x 1/6th x 1/6th = 1/14,256 or 0.000007015, which is exactly the same as rolling five 6s in a row! And the truth is that any string of five rolls will have this same probability of occurring, due to the large number of potential combinations that can occur.
The point this highlights, is that randomness is very difficult (if not impossible) to measure. Whilst we may observe rolls that, to us, seem like they must be the consequence of intervention, the same streak can be observed as a result of the variability of randomness. But what makes us believe that something is random or not? That’s something we’ll discuss in the next post.