What is Bernoulli distribution? Bernoulli Distribution Explained

Bernoulli distribution

Contributed by: Shailendra Singh
LinkedIn Profile: https://www.linkedin.com/in/shailendra-singh-a817802/

An important skill for people working in Data Science is to have a good understanding of the fundamental concepts of descriptive statistics and probability theory. This includes the key concepts of probability distribution, statistical significance, hypothesis testing, and regression. In practice, a simple analysis using R or scikit-learn in python, without quite understanding the probability distributions, often ends in errors and wrong results.

There are many probability distributions, but in this article, we will be talking about the simplest probability distribution called Bernoulli distribution. It is considered to be a building block for other more complicated discrete distributions. Before proceeding on to explaining Bernoulli distribution, we first need to understand some of the basic concepts used in probability distributions. Let’s get started.

Random Variables

In statistics and probability, random variable, random quantity, or stochastic variable are described as those variables whose values depend on the outcomes of an experiment (i.e. a random process). Random variables are of two types, discrete and continuous. In this text, we will cover a distribution type concerning discrete random variables.

To understand random variables with a simple example, assume that we execute a random experiment of rolling a dice. The possible outcome we could get from this experiment could be any number between 1 to 6. If X denotes the random variable that represents the outcome of such a random process, the sample space of this experiment consists of the outcomes {1, 2, 3, · · ·, 6}.

So X=1, if the outcome of the dice roll is 1, X=2, if the outcome of the dice roll is 2 and so on till X=6 if the outcome of the dice roll is 6. 

Taking a mathematical approach to simplify and generalize the problem, we can represent a single random event of rolling a dice as shown in a single box in the figure below. Extending the random event to n trials, shown as separate boxes in the figure below, would represent the outcome from n such random events.

Probability distribution

With the understanding of random variables, we can define a probability distribution to be a list of all the possible outcomes of a random variable, along with their corresponding probability values.

Considering our earlier example of a dice roll, we can represent the probability distribution of a 6 sided dice as given below.

Outcome 123456
Probability1/61/61/61/61/61/6

Table 1: Probability Distribution

We can represent the dice roll example graphically as shown below:

We can state the following in regards to the probability distribution table shown above-

  1. In the case of an experiment to roll a six-sided dice where the values lie in the set {1,2,3,4,5,6}. The outcome variable would always have a discrete value (between 1-6). 
  2. This is a univariate distribution since there is just one random variable i.e., the outcome of the dice roll. 

Therefore the distribution shown in the table above can be termed as a discrete univariate probability distribution. 

Also Read: What is Gradient Boosting?

Probability Functions

If we represent the probability in machine learning graphically, it will look like this-

The figure above represents a single trial(x1) experiment where n = 1. We could repeat the experiment n number of times for X={x1, x2,..xn } to get n outcomes. 

Discrete Probability Distribution: (Probability Mass Function)

When we use a probability function (which is described above) to describe a discrete distribution we call this function a probability mass function (pmf).

By a discrete distribution, we mean that the random variable of the underlying distribution can take on only finitely many different values (or it can be said that the outcome space is finite).

To define a discrete distribution, we can simply enumerate the probability of the random variable taking on each of the possible values. This enumeration is known as the probability mass function, as it divides up a unit mass (the total probability) and returns the probability of different values a random variable can take.

Generally, we can represent a probability mass function as below.

f(x) = P(X = x), for e.g. Taking the dice roll as a random variable, we can write the probability of the dice landing on the number 2 as   f(2) = P(X=2) = 1/6.

The probability mass function must follow the rules of probability, therefore-

  1. 0<=f(x)<=1
  2. ∑f(xi) = f(x1) + f(x2) + … = 1

Some of the examples of discrete events could be rolling a dice or tossing a coin, counts of events are discrete functions. As there are no in-between values therefore these can be called as discrete distributions. For example, we can only get heads or tails in a coin toss and a number between (1-6) in a dice roll. Similarly, in a count of the number of books issued by a library per hour, you can count something like 10 or 11 books, but nothing in between.

In the dice roll example, the dice roll is a random variable, The probability of the dice landing on a number 2 can be written as P(X=2) = 1/6. Where (capital letter), X, denotes the random variable and 2 is the outcome value. 

Bernoulli Distribution

Before defining Bernoulli distribution let us understand some basic terms:

Bernoulli event: An event for which the probability of occurrence is p and the probability of the event not occurring is 1-p i.e., the event has only two possible outcomes (these can be viewed as Success or Failure, Yes or No and Heads or Tails). The event occurs with a probability p and 1-p respectively. 

Bernoulli trial: A Bernoulli trial is an instantiation of a Bernoulli event. It is one of the simplest experiments that can be conducted in probability and statistics. It’s an experiment where there are two possible outcomes (Success and Failure).

Examples of Bernoulli trials:

  • Coin tosses: Record how many tosses of coins resulted in heads and how many coin tosses resulted in tails. We can consider the result of getting heads as success and not getting head i.e., getting tails to be a failure.
  • Football: How many shots on a goal post resulted in the goal score, and how many shots were missed. We can call a goal scored as a “success” and a missed target to be a failure.
  • Rolling Dice: The probability of a roll of two dice resulting in a double six. A double six dice roll could be considered to be a success and everything else can be considered a failure.

Bernoulli process: A sequence of Bernoulli trials is called a Bernoulli process. Among other conclusions that could be reached, for n trials, the probability of n successes is pⁿ.

What is Bernoulli Distribution?

The Bernoulli distribution is one of the easiest distributions to understand because of its simplicity. It is often used as a starting point to derive more complex distributions.

A Bernoulli distribution is a discrete distribution with only two possible values for the random variable. The distribution has only two possible outcomes and a single trial which is called a Bernoulli trial. The two possible outcomes in Bernoulli distribution are labeled by n=0 and n=1 in which n=1 (success) occurs with probability p and n=0 (failure) occurs with probability 1-p, and since it is a probability value so 0<=p<=1.

The probability mass function (PMF) of a Bernoulli distribution is defined as:

If an experiment has only two possible outcomes, “success” and “failure,” and if p is the probability of success, then-

Another common way to write this is-

Note: Success here refers to an outcome that we want to keep track of. For example, in the dice rolling example, a double six in both dice would be a success, anything else rolled would be failure.

Also Read: Linear Regression in Machine Learning

A simple example can be a single toss of a biased/unbiased coin. In the case of flipping an unbiased or fair coin, the value of p would be 0.5, giving a 50% probability of each outcome. However we must note that the probabilities of success and failure need not be equal all the time, like Bernoulli distribution in the case of a biased coin flip where probability of heads (success) is 0.6 while probability of tails (failure) is 0.4. The python code and the plot for this example is given below.

In the above Bernoulli distribution, the probability of success (1) on the right is 0.4, and the probability of failure (0) on the left is 0.6:

Python code for plotting bernoulli distribution in case of a loaded coin-

import numpy as np

import matplotlib.pyplot as plt

%matplotlib inline

probs = np.array([0.6, 0.4])

face = [0, 1]

plt.bar(face, probs)

plt.title(Biased coin Bernoulli Distribution’, fontsize=12)

plt.ylabel(‘Probability’, fontsize=12)

plt.xlabel(Biased coin Outcome’, fontsize=12)

axes = plt.gca()

axes.set_ylim([0,1])

Properties of a Bernoulli distribution:

  • There are only two possible outcomes a 1 or 0, i.e., success or failure in each trial.
  • The probability values of mutually exclusive events that encompass all the possible outcomes need to sum up to one.
  • If the probability of success is p then the probability of failure is given as 1-p. The probability values must remain the same across each successive trial. Each event must be completely separate and have nothing to do with the previous event. i.e., the probabilities are not affected by the outcomes of other trials which means the trials are independent.
  • The expected value for a random variable, X, from a Bernoulli distribution can be given as-

E[X] = 1*(p) +0*(1-p) = p, for example if p=0.6, then E[X] =0.6

  • The mean of Bernoulli random variable(X) is 

E[X] = 1(p) +0(1-p) = p

  • The variance of Bernoulli random variable is 

V[X] = E[X²]-[E(X)]² = 1²p+0²(1-p)-p²=p(1-p)

Bernoulli distribution is the building block for other more complicated discrete distributions. The distributions of several variate types can be defined based on sequences of independent Bernoulli trials. Such distributions are listed in the table below.

Discrete DistributionDefinition
Binomial DistributionModels the total number of successes in ‘n’ independent and repeated Bernoulli trials
Geometric DistributionModels the total number of failures before the first success in ‘n’ independent repeated Bernoulli trials
Negative binomial distributionModels the total number of failures before the xth success in ‘n’ independent repeated Bernoulli trials

Applications of Bernoulli Outcomes

There are real-life situations that involve noting if a specific event occurs or not. Such events are recorded as a success or a failure. E.g. Some of the examples that explain binary outcome scenarios involve calculating the probability of-

  • Success of a medical treatment
  • Interviewed person being a female
  • Student result(Pass/fail) in an exam
  • Transmittance of a disease (transmitted/not transmitted)

The Bernoulli distribution finds application in above cases as well as number of other situations that are similar to above cases. 

Bernoulli distribution using Python

We can generate a Bernoulli distributed discrete random variable using bernoulli.rvs() method from the scipy.stats module in python. The function will take the probability of success (p) as a shape parameter The size parameter decides the number of times the trials are to be repeated. For reproducibility, we can include a random_state argument assigned to a number.

Python code for plotting bernoulli distribution in case of a loaded coin-

from scipy.stats import bernoulli

import seaborn as sns

data = bernoulli.rvs(size=10000,p=0.6)

By visualizing the distribution, we can observe that we have only two possible outcomes:

Python code for plotting bernoulli distribution in case of a biased coin-

ax= sns.distplot(data,

                 kde=True,

                 color=”b”,

                 hist_kws={“linewidth”: 15,’alpha’:1})                

ax.set(xlabel=’Bernoulli Distribution’, ylabel=’Frequency’)

We can see from the plot above that out of total 10000 trials with success probability 0.6, we get about 6000 successes.

To learn about more concepts and pursue a career in Data Science, upskill with Great Learning’s PG program in Data Science and Engineering. Explore all our PG programs on data science here.

Empower yourself with our range of free online certificate courses tailored to meet the needs of individuals like you. Develop expertise in highly sought-after domains, including Data Science, Digital Marketing, Cybersecurity, Management, Artificial Intelligence, Cloud Computing, IT, and Software. Our courses have been meticulously created by industry experts, ensuring you receive hands-on training and practical knowledge. Whether you’re an aspiring professional looking to explore new career paths or an experienced practitioner seeking to stay ahead of the curve, our courses offer a flexible and user-friendly learning approach.

→ Explore this Curated Program for You ←

Avatar photo
Great Learning Editorial Team
The Great Learning Editorial Staff includes a dynamic team of subject matter experts, instructors, and education professionals who combine their deep industry knowledge with innovative teaching methods. Their mission is to provide learners with the skills and insights needed to excel in their careers, whether through upskilling, reskilling, or transitioning into new fields.

Recommended Data Science Courses

Data Science and Machine Learning from MIT

Earn an MIT IDSS certificate in Data Science and Machine Learning. Learn from MIT faculty, with hands-on training, mentorship, and industry projects.

4.63 ★ (8,169 Ratings)

Course Duration : 12 Weeks

PG in Data Science & Business Analytics from UT Austin

Advance your career with our 12-month Data Science and Business Analytics program from UT Austin. Industry-relevant curriculum with hands-on projects.

4.82 ★ (10,876 Ratings)

Course Duration : 12 Months

Scroll to Top