Basics of random variable for a non-statistician

Posted by on

When you are learning machine learning or data science, the word that is invariably mentioned is Random Variable. So, what is a random variable?

Let’s say you are tossing a coin (an unbiased coin). Let’s call this an experiment. Then an experiment of tossing a coin can have two outcomes. Either head (H) or a tail (T).

In other words, the set of possible outcomes for this experiment contains two values, ‘H’ and ‘T’.

[latex]
Possible Outcomes = \left\{H, T\right\}
[/latex]

So to answer the question, what is a random variable?

A random variable is a set of possible outcomes for a random experiment.

But we have to assign a numerical value to every outcome, in order to call it a random variable. So let’s call Head as 1 and Tail as 0.

So a random variable  with updated possible outcomes is

[latex]
RandomVariable = \left\{1, 0\right\}
[/latex]

Take a look at this example. What are the possible outcomes of a football match between England and France?

[latex]
Possible Outcomes = \left\{England Win, France Win, Draw, No Result\right\}
[/latex]

Again we will assign numerical values to these outcomes.

 

OutcomeAssigned Value
 England Win 0
 France Win 1
 Draw 2
 No Result 3

Let’s update possible outcomes

[latex]
RandomVariable = \left\{0, 1, 2, 3\right\}
[/latex]

Take a note here that it’s not necessary to assign values from 0 or in an increasing order or only positive values. You can assign -150 to France Win and 99 to England Win.

Random variables are always denoted by capital letters like X, Y, Z etc.

[latex]
X = \left\{0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10\right\}
[/latex]

Types of Random Variable

Let us again take a look at outcomes of a match between England and France.

[latex]
X = \left\{0, 1, 2, 3\right\}
[/latex]

As you can see there are only 4 outcomes that are possible with this match. In statistical terms, this match is an experiment. So there is no fifth outcome for this experiment. Also, outcomes like 0.1, 0.2 or 2.5, 2.6 or any in-between values are not possible.

As I said earlier you can assign any values like 2.5, 3.5, 4, 10 to the outcomes. Then again values like 5, 6, 7 are not possible.

In short, only 4 outcomes are possible whatever may be the values assigned to the outcomes, the in-between values are not possible.

So, when a random variable can take finite a number of distinct values then it is a discrete random variable.

Let’s take a look at another example. What will be the time required to complete the 100m dash? Usain Bolt recorded 9.58 seconds. So for all practical purposes let’s say the time taken to complete the dash will vary from 9.58 seconds to 10.08 seconds. These are the times taken by the professional athletes.

[latex]
X = \left\{Any-value-between-9.58-seconds-to-10.08-seconds\right\}
[/latex]

 

Basically, time required to complete a 100m dash will vary from this

via GIPHY

to this

via GIPHY

So this random variable that can literally have an infinite number of outcomes in a given range is a continuous random variable.

Sample Space

In our first example of the discrete random variable, we assigned 1 to Head and 0 to Tail. In this example the sample space ‘S’ is

[latex]
S = \left\{H, T\right\}
[/latex]

But when we assigned the numerical values to Head and Tail so that it became a random variable.
[latex]
X\left(H\right)=1
[/latex]

[latex]
X\left(T\right)=0
[/latex]

Important Point Regarding Random Variable (RV)

It’s not really a variable. In Algebraic sense a variable is an unknown or a value on which you carry out mathematic functions like addition, subtraction, multiplication and so on. You cannot do it with a random variable. A RV is a set of values and it can take any random value in that set.

Once you understand what a RV is, we can start learning further topics like probability distribution, cumulative distribution function and probability distribution function. These topics are really important when you want to learn data science or machine learning.

In the next blog post, we will look at the basics of probability.

Note:

I am no more writing regarding Python or programming on this blog, as I have shifted my focus from programming to WordPress and web development. If you are interested in WordPress, you can continue reading other articles on this blog.
Thanks and Cheers ????

>