Plotting histograms from grouped data in a pandas DataFrame

Question

How do I plot a block of histograms from a group of data in a dataframe? For example, given:

from pandas import DataFrame
import numpy as np
x = ['A']*300 + ['B']*400 + ['C']*300
y = np.random.randn(1000)
df = DataFrame({'Letter': x, 'N': y})

I tried:

df.groupby('Letter').hist()

...which failed with the error message:

TypeError: cannot concatenate 'str' and 'float' objects

dreme · Accepted Answer · 2023-07-31 04:04:30Z

266

I'm on a roll, just found an even simpler way to do it using the by keyword in the hist method:

df.hist('N', by='Letter')

That's a very handy little shortcut for quickly scanning your grouped data!

For future visitors, the product of this call is the following chart:

In answer to questions below, here's an example of specific tailoring of the histogram plots:

# import libraries
import pandas as pd
import numpy as np

# Create test dataframe
x = ['A']*300 + ['B']*400 + ['C']*300
y = np.random.randn(1000)
z = np.random.randn(1000)
df = pd.DataFrame({'Letter':x, 'N1':y, 'N2':z})

# Plot histograms
axes = df.hist(['N1','N2'], by='Letter',bins=10, layout=(2,2),
               legend=True, yrot=90,sharex=True,sharey=True, 
               log=True, figsize=(6,6))
for ax in axes.flatten():
    ax.set_xlabel('N')
    ax.set_ylabel('Count')
    ax.set_ylim(bottom=1,top=100)

edited Jul 31, 2023 at 4:04

answered Oct 26, 2013 at 6:59

dreme

4,9714 gold badges23 silver badges22 bronze badges

Sign up to request clarification or add additional context in comments.

8 Comments

Phani Over a year ago

Is there a way to get these in the same plot?

Jonathan Jin Over a year ago

@Phani: stackoverflow.com/questions/6871201/…

Nosey Over a year ago

For a larger plot; df['N'].hist(by=df['Letter']), figsize = (16,18))

prof_FL Over a year ago

df.groupby('age').survived.value_counts().unstack().plot.bar(width=1, stacked=True)) I've found a code that plot all in the same plot.

bernando_vialli Over a year ago

@dreme what if I am doing this on an entire dataframe, I am trying to do random_sample_join.hist(bins=5, figsize=(20, 20), rwidth=5, by = random_sample_join['x']) but it doesn't work, gives me some really weird looking graph that makes no sense

|

Paul · Accepted Answer · 2015-12-10 20:50:05Z

14

One solution is to use matplotlib histogram directly on each grouped data frame. You can loop through the groups obtained in a loop. Each group is a dataframe. And you can create a histogram for each one.

from pandas import DataFrame
import numpy as np
x = ['A']*300 + ['B']*400 + ['C']*300
y = np.random.randn(1000)
df = DataFrame({'Letter':x, 'N':y})
grouped = df.groupby('Letter')

for group in grouped:
  figure()
  matplotlib.pyplot.hist(group[1].N)
  show()

edited Dec 10, 2015 at 20:50

answered Oct 25, 2013 at 12:17

Paul

7,3758 gold badges45 silver badges41 bronze badges

4 Comments

dreme Over a year ago

Thanks too Paul. I'm a little mystified about the '[1]' in 'group[1].N'. Each 'group' seems to be a DF with just two columns (Letter and N) when I added a 'print group' statement in the for loop. In that case, shouldn't 'group.N' suffice?

dreme Over a year ago

Ah, actually belay that comment, just figured it out. Each 'group' is actually a two element tuple of the group name and the group DF. Doh!

Gigo Over a year ago

I recommend splitting the tuple in the for loop: for index, group in grouped, then you can omit the [1].

Sam Over a year ago

matplotlib.pyplot.figure() and matplotlib.pyplot.show() should come off the loop.

dirkjot · Accepted Answer · 2019-06-18 06:52:41Z

10

With recent version of Pandas, you can do df.N.hist(by=df.Letter)

Just like with the solutions above, the axes will be different for each subplot. I have not solved that one yet.

answered Jun 18, 2019 at 6:52

dirkjot

3,7841 gold badge26 silver badges18 bronze badges

1 Comment

dreme Over a year ago

You can use the sharex and sharey keywords to get common axes for your plots, i.e.: df.N.hist(by=df.Letter, sharey=True, sharex=True)

cwharland · Accepted Answer · 2013-10-25 14:33:29Z

9

Your function is failing because the groupby dataframe you end up with has a hierarchical index and two columns (Letter and N) so when you do .hist() it's trying to make a histogram of both columns hence the str error.

This is the default behavior of pandas plotting functions (one plot per column) so if you reshape your data frame so that each letter is a column you will get exactly what you want.

df.reset_index().pivot('index','Letter','N').hist()

The reset_index() is just to shove the current index into a column called index. Then pivot will take your data frame, collect all of the values N for each Letter and make them a column. The resulting data frame as 400 rows (fills missing values with NaN) and three columns (A, B, C). hist() will then produce one histogram per column and you get format the plots as needed.

answered Oct 25, 2013 at 14:33

cwharland

6,8133 gold badges25 silver badges29 bronze badges

2 Comments

Douglas Fils Over a year ago

When I follow this I don't get my plots by an array of them. Is this do to some error in my approach? I get an array of matplotlib.axes.AxesSubplot object at 0x246c5fe10 items. Is there some way to get these to display, say 3 or 4 per row?

dreme Over a year ago

If you're using an ipython notebook, then run either the %pylab or %matplotlib magic functions to automatically display the plots

Union find · Accepted Answer · 2021-08-25 22:18:01Z

2

I find this even easier and faster.

data_df.groupby('Letter').count()['N'].hist(bins=100)

answered Aug 25, 2021 at 22:18

Union find

8,27017 gold badges70 silver badges118 bronze badges

Comments

Gabriele · Accepted Answer · 2020-10-24 17:26:41Z

I write this answer because I was looking for a way to plot together the histograms of different groups. What follows is not very smart, but it works fine for me. I use Numpy to compute the histogram and Bokeh for plotting. I think it is self-explanatory, but feel free to ask for clarifications and I'll be happy to add details (and write it better).

figures = {
    'Transit': figure(title='Transit', x_axis_label='speed [km/h]', y_axis_label='frequency'),
    'Driving': figure(title='Driving', x_axis_label='speed [km/h]', y_axis_label='frequency')
}

cols = {'Vienna': 'red', 'Turin': 'blue', 'Rome': 'Orange'}
for gr in df_trips.groupby(['locality', 'means']):
    locality = gr[0][0]
    means = gr[0][1]
    fig = figures[means]
    h, b = np.histogram(pd.DataFrame(gr[1]).speed.values)
    fig.vbar(x=b[1:], top=h, width=(b[1]-b[0]), legend_label=locality, fill_color=cols[locality], alpha=0.5)

show(gridplot([
    [figures['Transit']],
    [figures['Driving']],
]))

Collectives™ on Stack Overflow

Plotting histograms from grouped data in a pandas DataFrame

6 Answers 6

8 Comments

4 Comments

1 Comment

2 Comments

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

6 Answers 6

8 Comments

4 Comments

1 Comment

2 Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related