How to simulate a normal distribution in python

What is a normal distribution?

A normal distribution is probably the most used modeling function in statistics. It works like this: we group the similar elements in the data and we count how many times they appear.

We can think of this example as the age of a group of people: the highest frequency is 150 people who are 65 years old

A normal distribution can show us immediately how data is distributed in a dataset. This becomes very useful when we want to know immediately which data is too close and which one is too far (because we might want to delete it), and if the data is predictable (has a low variance) or probabilistically uncertain (high variance).

Where is it used?

The majority of real data can be modeled using a normal distribution: we can think of stock data, sports scores, psychometric values… This makes it really popular, and because almost any data can correspond to this mathematical function, we can use it to run pre-determined calculations.

For example, in stock training or portfolio diversification risk is assessed using the standard deviation: this means that a stock whose returns form a spread normal distribution is considered very risky (because it can have high fluctuations), while a stock with a narrow normal distribution is considered safer.

What do we need to replicate it?

To adapt a normal distribution to real data is very simple, we can only play with 3 numbers: mean, standard deviation, and alfa.

  • The mean allows the distribution to move left (lower) or right (higher)
  • The standard deviation makes the distribution spread (the higher, the larger)
  • The alfa curves the distribution from left (negative) to right (positive)

Coding the distribution

To code the distribution, it means to generate a final dataset of thousands/millions of samples that when graphed look like a normal distribution. Probabilistically speaking, this is how it is done following a function with our own parameters (to make the code simpler for beginners I haven’t put it into a function, but you can easily manage it):

import pandas as pd
from scipy.stats import skewnorm

def create_pdf(sd, mean, alfa):
    #invertire il segno di alfa
    x = skewnorm.rvs(alfa, size=1000000) 
    def calc(k, sd, mean):
      return (k*sd)+mean
    x = calc(x, sd, mean) #standard distribution
    return x

x = create_pdf(sd=0.1, mean=1, alfa=5)

Graphing the normal distribution

Once we have created a dataset with several points (1,000,000) randomly picked from the normal distribution, we can easily exploit the Pandas visualization API to show an histogram of our distribution:

pd.DataFrame(x).hist(bins=200)
Normal distribution with minimum skewness

If we wish to make the distribution skewed to the left, we can change the alfa parameter to 5:

Normal distribution with high skewness

Join our free programming community on discord, learn how to code, and meet other experts

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: