Machine Learning (ML) and Deep Learning (DL) advancements empower machines to learn from past data and predict even from unseen data. One such advancement is Generative Models that capture the underlying distribution of data and generate new data comparable to the original training data. But how do they do it?
It’s Probability Distribution with the help of which generative models can manage the uncertainties and variations in the data. Read this chapter to understand probability distribution, its types, its uses in Generative Modeling, and its applications.
What is Probability Distribution?
Probability Distribution is a mathematical function that represents the probability of different possible values of a random variable within a given range. We can use either graphs or probability tables to depict probability distribution.
For example, imagine flipping a coin, there is a probability distribution that tells us the chances of getting heads or tails. The following probability table describes it −
Outcomes | Probability |
---|---|
Heads | 0.5 |
Tails | 0.5 |
A probability distribution is a theoretical representation of frequency distribution (FD). In statistics, FD describes the number of occurrences of a variable in a dataset. On the other hand, probability distribution, along with the frequencies of number of occurrences, also assigns probabilities to them.
As we know probability, that says how likely something is to occur is a number, is between 0 (means impossible) and 1 (means certain). That’s why the higher probability of a value represents its higher frequency in a sample.
Types of Probability Distributions
There are two types of probability distributions −
- Discrete Probability Distributions
- Continuous Probability Distributions
Let”s take a closer look at these two types of probability distributions.
Discrete Probability Distributions
Discrete probability distributions are mathematical functions that describe the probabilities of different occurrences from a discrete or categorial random variables.
Discrete probability distribution includes only those values with a possible probability. In simple words, it does not include any value with zero probability. For example, 5.5 is not a possible outcome of dice rolls, hence it does not include as a probability distribution of dice rolls.
The total of the probabilities of all possible values in a discrete probability distribution is always one.
Let’s see some common discrete probability distributions −
Discrete Probability Distribution | Explanation | Example |
---|---|---|
Bernoulli Distribution | It describes the probability of success (1) or failure (0) in a single experiment. | The outcome of a single coin flip. |
Binomial Distribution | It models the number of successes in a fixed number of trials n with p probability. | The number of times it comes heads when you toss a coin 10 times. |
Poisson Distribution | It predicts the k number of events occurring in a fixed interval of time or space. | The number of emails messages received per day. |
Geometric Distribution | It represents the number of trials needed to achieve the first success in a sequence of trials. | The number of times a coin is flipped until it lands on heads. |
Hypergeometric Distribution | It calculates the probability of drawing a specific number of successes from a finite population. | The number of red balls drawn from a bag of mixed colored balls. |
Continuous Probability Distributions
As the name implies, continuous probability distributions are mathematical functions that describe the probabilities of different occurrences within a continuous range of values.
Continuous probability distribution includes an infinite number of possible values. For example, in the interval [4, 5] there are infinite values between 4 and 5.
Let’s see some common continuous probability distributions −
Continuous Probability Distribution | Explanation | Example |
---|---|---|
Continuous Uniform Distribution | It assigns equal probability to all values within equal-sized interval. | The height of a person between 5 to 6 feet. |
Normal (Gaussian) Distribution | It forms a bell-shaped curve and describes the data clustered around the mean and symmetrical tails. | IQ scores |
Exponential Distribution | It models the time between events in a Poisson process, where events occur at a constant rate. | The time until the next customer arrives. |
Log-normal Distribution | It represents the right-skewed data when plotted on a logarithmic scale. | Stock prices, income distributions, etc. |
Beta Distribution | It describes the random variables constrained to a finite interval. It is often used in Bayesian statistics. | The probability of success in a binomial trial. |
Use of Probability Distributions in Generative Modeling
Probability distributions play a crucial role in generative modeling. Let’s check out some of the important ways in which probability distributions are used in generative modeling −
- Data Distribution − Generative Models aim to capture the underlying probability distribution of data from which the samples are taken.
- Generating New Samples − Once understanding the data distribution is done, generative models can generate new data comparable to the original dataset.
- Evaluation and Training − Probability distributions are used to evaluate and train generative models. Evaluation metrics such as likelihood, perplexity, and Wasserstein distance are used to evaluate the quality of generated samples compared to the original dataset.
- Variability and Uncertainty − Probability distributions are used to find the variability and uncertainty present in the data. Generative models can use this information to generate distinct and realistic samples.
Applications of Probability Distribution
There is a wide range of generative modeling tasks across various domains that use probability distributions, some of which are listed below −
- Image Generation − Generative models such as Generative Adversarial Networks (GANs) and Variational Autoencoders (VAEs) use probability distributions to generate realistic images from scratch. This has applications in computer graphics, creative design, and content generation.
- Text Synthesis − Language models, such as OpenAI”s ChatGPT, use probability distributions to generate relevant text output based on a given prompt or input. This has applications in chatbots, virtual assistants, and automated content generation systems.
- Anomaly Detection − Generative model, by learning the underlying probability distribution of normal data, can be used for anomaly detection and outlier identification in datasets. This has applications in fraud detection, network security, and medical diagnostics.
Conclusion
In this chapter, we explained the critical role of probability distribution in generative modeling. We first covered what probability distributions are along with their types, Discrete and Continuous Probability Distribution.
Discrete probability distributions describe the probabilities of different occurrences from discrete or categorical random variables, whereas continuous probability distributions describe the probabilities of different occurrences within a continuous range of values. We also highlighted some of the common probability distributions that come under discreate and continuous probability distributions.
We demonstrated how data distribution, generating new samples, evaluation, and training are some of the important ways in which probability distributions are used in generative modeling to generate new samples. We also highlighted the diverse applications of probability distributions in generative modeling tasks such as image generation, text synthesis, and anomaly detection.