3

I have the following code to draw some histograms about subjects in a database:

import matplotlib.pyplot as plt

attr_info = {
    'Gender': ['m', 'f', 'm', 'm', 'f', 'm', 'm', 'f', 'm', 'f'],
    'Age': [9, 43, 234, 23, 2, 95, 32, 63, 58, 42],
    'Smoker': ['y', 'n', 'y', 'y', 'n', 'n', 'n', 'n', 'y', 'y']
}
bin_info = {key: None for key in attr_info}
bin_info['Age'] = 10

for name, a_info in attr_info.items():
    plt.figure(num=name)
    counts, bins, _ = plt.hist(a_info, bins=bin_info[name], color='blue', edgecolor='black')

    plt.margins(0)
    plt.title(name)
    plt.xlabel(name)
    plt.ylabel("# Subjects")
    plt.yticks(range(0, 11, 2))
    plt.grid(axis='y')
    plt.tight_layout(pad=0)

    plt.show()

This code works but it draws each attribute's distribution in a separate histogram. What I'd like to achieve is something like this:

Stacked histogram

I'm aware plt.hist has a stacked parameter, but that seems to be intended for a slightly different use, where you're stacking the same attributes on each other at different subject types. You could for example draw a histogram where each whole bar would represent some age range and the bar itself would be a stack of smokers in one colour and non-smokers in another.

I haven't been able to figure out how to use it to stack (and properly label as in the image) different attributes on top of each other in each bar.

2 Answers 2

3

You need to play around with your data a bit, but this can be done without pandas. Also, what you want are stacked bar plots, not histograms:

import matplotlib.pyplot as plt

attr_info = {
'Gender': ['m', 'f', 'm', 'm', 'f', 'm', 'm', 'f', 'm', 'f'],
'Age': [9, 43, 234, 23, 2, 95, 32, 63, 58, 42],
'Smoker': ['y', 'n', 'y', 'y', 'n', 'n', 'n', 'n', 'y', 'y']
}

# Filter your data for each bar section that you want
ages_0_10 = [x for x in attr_info['Age'] if x < 10]
ages_10_40 = [x for x in attr_info['Age'] if x >= 10 and x < 40]
ages_40p = [x for x in attr_info['Age'] if x > 40]

gender_m = [x for x in attr_info['Gender'] if 'm' in x]
gender_f = [x for x in attr_info['Gender'] if 'f' in x]

smoker_y = [x for x in attr_info['Smoker'] if 'y' in x]
smoker_n = [x for x in attr_info['Smoker'] if 'n' in x]

# Locations for each bin (you can move them around)
locs = [0, 1, 2]

# I'm going to plot the Ages bin separate than the Smokers and Gender ones, 
# since Age has 3 stacked bars and the other have just 2 each
plt.bar(locs[0], len(ages_0_10), width=0.5)  # This is the bottom bar

# Second stacked bar, note the bottom variable assigned to the previous bar
plt.bar(locs[0], len(ages_10_40), bottom=len(ages_0_10), width=0.5) 

# Same as before but now bottom is the 2 previous bars    
plt.bar(locs[0], len(ages_40p), bottom=len(ages_0_10) + len(ages_10_40), width=0.5)

# Add labels, play around with the locations
#plt.text(x, y, text)
plt.text(locs[0], len(ages_0_10) / 2, r'$<10$')
plt.text(locs[0], len(ages_0_10) + 1, r'$[10, 40]$')
plt.text(locs[0], len(ages_0_10) + 5, r'$>40$')


# Define the top bars and bottom bars for the Gender and Smokers stack
# In both cases is just 2 stacked bars,
# so we can use a list for this instead of doing it separate as for Age
tops = [len(gender_m), len(smoker_y)]
bottoms = [len(gender_f), len(smoker_n)]

plt.bar(locs[1:], bottoms, width=0.5)
plt.bar(locs[1:], tops, bottom=bottoms, width=0.5)

# Labels again
# Gender
plt.text(locs[1], len(gender_m) / 2, 'm')
plt.text(locs[1], len(gender_m) + 2, 'f')

# Smokers
plt.text(locs[2], len(smoker_y) / 2, 'y')
plt.text(locs[2], len(smoker_n) + 2, 'n')

# Set tick labels
plt.xticks(locs, ('Age', 'Gender', 'Smoker'))
plt.show()

Result: enter image description here

Check the documentation for pyplot.bar and this example.

Sign up to request clarification or add additional context in comments.

Comments

2

How about trying out pandas:

import pandas as pd

attr_info = {
    'Gender': ['m', 'f', 'm', 'm', 'f', 'm', 'm', 'f', 'm', 'f'],
    'Age': [9, 43, 234, 23, 2, 95, 32, 63, 58, 42],
    'Smoker': ['y', 'n', 'y', 'y', 'n', 'n', 'n', 'n', 'y', 'y']
}

df =  pd.DataFrame(attr_info)

bins = [0,32,45,300] #bins can be adjusted to your liking

#deselect "Age" and select all remaining columns
counts = df.filter(regex="[^Age]").apply(pd.Series.value_counts) 
#bin age data and count
age_data = df.groupby(pd.cut(df['Age'], bins=bins))["Age"].count()

fig, ax = plt.subplots()
pd.concat([counts,age_data]).rename(columns={0:"Age"}).T.plot(kind="bar", stacked=True, ax=ax)
ax.legend(loc='center left', bbox_to_anchor=(1, 0.5))

Output:

enter image description here

The advantage of this approach is its generality, no matter how many columns you want to plot.

2 Comments

Still not quite what I'm looking for. This is still splitting each bar based on the Age attribute. I want each bar to be of the same height (10) and be split (and appropriately labeled) based on a different attribute.
@MatedeVita Sorry for the misunderstanding, I updated the code.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.