14

I am going through Think Stats and I would like to compare multiple data sets visually. I can see from the book examples that it is possible to generate an interleaved bar graph with a different color for each data set by using a module provided by the book author, how to obtain the same result in pyplot?

3 Answers 3

9

Call the bar function multiple times, one for each series. You can control the left position of the bars using the left parameter, and you can use this to prevent overlap.

Entirely untested code:

pyplot.bar( numpy.arange(10) * 2, data1, color = 'red' )
pyplot.bar( numpy.arange(10) * 2 + 1, data2, color = 'red' )

Data2 will be drawn shifted over the right compared to where data one will be drawn.

Sign up to request clarification or add additional context in comments.

1 Comment

It seems that the behaviour is different now. I am getting overlapping bars using your method and I don't see the "left parameter" in the version 3.9 of matplotlib.
5

Matplotlib's example code for interleaved bar charts works nicely for arbitrary real-valued x coordinates (as mentioned by @db42).

However, if your x coordinates are categorical values (like in the case of dictionaries in the linked question), the conversion from categorical x coordinates to real x coordinates is cumbersome and unnecessary.

You can plot two dictionaries side-by-side directly using matplotlib's api. The trick for plotting two bar charts with an offset to each other is to set align=edge and a positive width (+width) for plotting one bar chart, whereas a negative width (-width) for plotting the other one.

The example code modified for plotting two dictionaries looks like the following then:

"""
========
Barchart
========

A bar plot with errorbars and height labels on individual bars
"""
import matplotlib.pyplot as plt

# Uncomment the following line if you use ipython notebook
# %matplotlib inline

width = 0.35       # the width of the bars

men_means = {'G1': 20, 'G2': 35, 'G3': 30, 'G4': 35, 'G5': 27}
men_std = {'G1': 2, 'G2': 3, 'G3': 4, 'G4': 1, 'G5': 2}

rects1 = plt.bar(men_means.keys(), men_means.values(), -width, align='edge',
                yerr=men_std.values(), color='r', label='Men')

women_means = {'G1': 25, 'G2': 32, 'G3': 34, 'G4': 20, 'G5': 25}
women_std = {'G1': 3, 'G2': 5, 'G3': 2, 'G4': 3, 'G5': 3}

rects2 = plt.bar(women_means.keys(), women_means.values(), +width, align='edge',
                yerr=women_std.values(), color='y', label='Women')

# add some text for labels, title and axes ticks
plt.xlabel('Groups')
plt.ylabel('Scores')
plt.title('Scores by group and gender')
plt.legend()

def autolabel(rects):
    """
    Attach a text label above each bar displaying its height
    """
    for rect in rects:
        height = rect.get_height()
        plt.text(rect.get_x() + rect.get_width()/2., 1.05*height,
                '%d' % int(height),
                ha='center', va='bottom')

autolabel(rects1)
autolabel(rects2)

plt.show()

The result:

barchart_demo.png

Comments

3

I came across this problem a while ago and created a wrapper function that takes a 2D array and automatically creates a multi-barchart from it:

Multi-category bar chart

The code:

import matplotlib.pyplot as plt
import matplotlib.cm as cm
import operator as o

import numpy as np

dpoints = np.array([['rosetta', '1mfq', 9.97],
           ['rosetta', '1gid', 27.31],
           ['rosetta', '1y26', 5.77],
           ['rnacomposer', '1mfq', 5.55],
           ['rnacomposer', '1gid', 37.74],
           ['rnacomposer', '1y26', 5.77],
           ['random', '1mfq', 10.32],
           ['random', '1gid', 31.46],
           ['random', '1y26', 18.16]])

fig = plt.figure()
ax = fig.add_subplot(111)

def barplot(ax, dpoints):
    '''
    Create a barchart for data across different categories with
    multiple conditions for each category.

    @param ax: The plotting axes from matplotlib.
    @param dpoints: The data set as an (n, 3) numpy array
    '''

    # Aggregate the conditions and the categories according to their
    # mean values
    conditions = [(c, np.mean(dpoints[dpoints[:,0] == c][:,2].astype(float))) 
                  for c in np.unique(dpoints[:,0])]
    categories = [(c, np.mean(dpoints[dpoints[:,1] == c][:,2].astype(float))) 
                  for c in np.unique(dpoints[:,1])]

    # sort the conditions, categories and data so that the bars in
    # the plot will be ordered by category and condition
    conditions = [c[0] for c in sorted(conditions, key=o.itemgetter(1))]
    categories = [c[0] for c in sorted(categories, key=o.itemgetter(1))]

    dpoints = np.array(sorted(dpoints, key=lambda x: categories.index(x[1])))

    # the space between each set of bars
    space = 0.3
    n = len(conditions)
    width = (1 - space) / (len(conditions))

    # Create a set of bars at each position
    for i,cond in enumerate(conditions):
        indeces = range(1, len(categories)+1)
        vals = dpoints[dpoints[:,0] == cond][:,2].astype(np.float)
        pos = [j - (1 - space) / 2. + i * width for j in indeces]
        ax.bar(pos, vals, width=width, label=cond, 
               color=cm.Accent(float(i) / n))

    # Set the x-axis tick labels to be equal to the categories
    ax.set_xticks(indeces)
    ax.set_xticklabels(categories)
    plt.setp(plt.xticks()[1], rotation=90)

    # Add the axis labels
    ax.set_ylabel("RMSD")
    ax.set_xlabel("Structure")

    # Add a legend
    handles, labels = ax.get_legend_handles_labels()
    ax.legend(handles[::-1], labels[::-1], loc='upper left')

barplot(ax, dpoints)
plt.show()

If you're interested in what this function does and the logic behind it, here's a (shamelessly self-promoting) link to the blog post describing it.

1 Comment

Hi, how do I also add multiple xlabels, one for each of the 3 series you present here?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.