”;
A histogram is like a visual summary that shows how often different values appear in a set of data. Imagine you have a collection of numbers, like ages of people. A histogram divides these numbers into groups, called “bins,” and then uses bars to represent how many numbers fall into each bin. The taller the bar, the more numbers are in that group.
Histogram in Matplotlib
We can create a histogram in Matplotlib using the hist() function. This function allows us to customize various aspects of the histogram, such as the number of bins, color, and transparency. Histogram in Matplotlib is used to represent the distribution of numerical data, helping you to identify patterns.
The hist() Function
The hist() function in Matplotlib takes a dataset as input and divides it into intervals (bins). It then displays the frequency (count) of data points falling within each bin as a bar graph.
Following is the syntax of hist() function in Matplotlib −
Syntax
plt.hist(x, bins=None, range=None, density=False, cumulative=False, color=None, edgecolor=None, ...)
Where,
-
x is the input data for which the histogram is determined.
-
bins (optional) is the number of bins or the bin edges.
-
range (optional) is the lower and upper range of the bins. Default is the minimum and maximum of x
-
If density (optional) is True, the histogram represents a probability density function. Default is False.
-
If cumulative (optional) is True, a cumulative histogram is computed. Default is False.
These are just a few parameters; there are more optionals parameters available for customization.
Creating a Vertical Histogram
In Matplotlib, creating a vertical histogram involves plotting a graphical representation of the frequency distribution of a dataset, with the bars oriented vertically along the y-axis. Each bar represents the frequency or count of data points falling within a particular interval or bin along the x-axis.
Example
In the following example, we are creating a vertical histogram by setting the “orientation” parameter to “vertical” within the hist() function −
import matplotlib.pyplot as plt plt.rcParams["figure.figsize"] = [7.50, 3.50] plt.rcParams["figure.autolayout"] = True x = [1, 2, 3, 1, 2, 3, 4, 1, 3, 4, 5] plt.hist(x, orientation="vertical") plt.show()
Output
We get the output as shown below −
Customized Histogram with Density
When we create a histogram with density, we are providing a visual summary of how data is distributed. We use this graph to see how likely different numbers are occurring, and the density option makes sure the total area under the histogram is normalized to one.
Example
In the following example, we are visualizing random data as a histogram with 30 bins, displaying it in green with a black edge. We are using the density=True parameter to represent the probability density −
import matplotlib.pyplot as plt import numpy as np # Generate random data data = np.random.randn(1000) # Create a histogram with density and custom color plt.hist(data, bins=30, density=True, color=''green'', edgecolor=''black'', alpha=0.7) plt.xlabel(''Values'') plt.ylabel(''Probability Density'') plt.title(''Customized Histogram with Density'') plt.show()
Output
After executing the above code, we get the following output −
Cumulative Histogram
When we create a cumulative histogram, we graphically represent the total number of occurrences of values up to a certain point. It shows how many data points fall below or equal to a certain value.
Example
In here, we are using a histogram where each bar represents a range of exam scores, and the height of the bar tells us how many students, in total, scored within that range. By setting the cumulative=True parameter in the hist() function, we make sure that the histogram shows the cumulative progression of scores −
import matplotlib.pyplot as plt import numpy as np # Generate random exam scores (out of 100) exam_scores = np.random.randint(0, 100, 150) # Create a cumulative histogram plt.hist(exam_scores, bins=20, cumulative=True, color=''orange'', edgecolor=''black'', alpha=0.7) plt.xlabel(''Exam Scores'') plt.ylabel(''Cumulative Number of Students'') plt.title(''Cumulative Histogram of Exam Scores'') plt.show()
Output
Following is the output of the above code −
Histogram with Different Color and Edge Color
When creating a histogram, we can customize the fill color and edge color, adding a visual touch to represent the data distribution. By doing this, we blend the histogram with a stylish and distinctive appearance.
Example
Now, we are generating a histogram for random data with 25 bins, and we are presenting it in purple color with blue edges −
import matplotlib.pyplot as plt import numpy as np data = np.random.randn(1000) # Creating a histogram with different color and edge color plt.hist(data, bins=25, color=''purple'', edgecolor=''blue'') plt.xlabel(''Values'') plt.ylabel(''Frequency'') plt.title(''Histogram with Different Color and Edge Color'') plt.show()
Output
On executing the above code we will get the following output −
Example
To plot a histogram with colors, we can also extract colors from the “cm” parameter in the setp() method.
import numpy as np from matplotlib import pyplot as plt plt.rcParams["figure.figsize"] = [7.00, 3.50] plt.rcParams["figure.autolayout"] = True data = np.random.random(1000) n, bins, patches = plt.hist(data, bins=25, density=True, color=''red'', rwidth=0.75) col = (n-n.min())/(n.max()-n.min()) cm = plt.cm.get_cmap(''RdYlBu'') for c, p in zip(col, patches): plt.setp(p, ''facecolor'', cm(c)) plt.show()
Output
On executing the above code we will get the following output −
Example
In here, we are specifying different colors for different bars in a matplotlib histogram by iterating in the range of number of bins and setting random facecolor for each bar −
import numpy as np import matplotlib.pyplot as plt import random import string # Set the figure size plt.rcParams["figure.figsize"] = [7.50, 3.50] plt.rcParams["figure.autolayout"] = True # Figure and set of subplots fig, ax = plt.subplots() # Random data data = np.random.rand(100) # Plot a histogram with random data N, bins, patches = ax.hist(data, edgecolor=''black'', linewidth=1) # Random facecolor for each bar for i in range(len(N)): patches[i].set_facecolor("#" + ''''.join(random.choices("ABCDEF" + string.digits, k=6))) # Display the plot plt.show()
Output
On executing the above code we will get the following output −
Stacked Histogram with Multiple Datasets
A stacked histogram with multiple datasets is a visual representation that combines the distributions of two or more sets of data. The bars are stacked on top of each other, allowing for a comparison of how different datasets contribute to the overall distribution.
Example
In the example below, we represent two different datasets “data1” and “data2” with specific values, showing their distributions in different colors (skyblue and salmon) −
import matplotlib.pyplot as plt import numpy as np # Sample data for two datasets data1 = np.array([2, 4, 5, 7, 9, 10, 11, 13, 14, 15]) data2 = np.array([6, 7, 8, 10, 11, 12, 13, 14, 15, 16]) # Creating a stacked histogram with different colors plt.hist([data1, data2], bins=10, stacked=True, color=[''skyblue'', ''salmon''], edgecolor=''black'') plt.xlabel(''Values'') plt.ylabel(''Frequency'') plt.title(''Stacked Histogram with Multiple Datasets'') plt.legend([''Dataset 1'', ''Dataset 2'']) plt.show()
Output
On executing the above code we will get the following output −
”;