1

I'd like to plot a bar chart in Python, similar to Excel. However, I am struggling to have two different x-axes. For example, for each size (like 8M), I want to plot the results of all 5 strategies. For each strategy, there are 3 metrics (Fit, boot, and exp).

enter image description here

You can download the original excel file here here.

This is my code so far:

    
df = pd.read_excel("data.xlsx",sheet_name="Sheet1")
r1= df['Fit']
r2= df['Boot']
r3= df['Exp']

x= df['strategy']

n_groups = 5

# create plot
fig, ax = plt.subplots()
index = np.arange(n_groups)
names = ["8M","16M","32M","64M","128M"]

bar_width = 0.1
opacity = 0.8

Fit8= [r1[0],r1[1],r1[2],r1[3],r1[4]]
Boot8= [r2[0],r2[1],r2[2],r2[3],r2[4]]
Exp8= [r3[0],r3[1],r3[2],r3[3],r3[4]]

Fit16= [r1[5],r1[6],r1[7],r1[8],r1[9]]
Boot16= [r2[5],r2[6],r2[7],r2[8],r2[9]]
Exp16= [r3[5],r3[6],r3[7],r3[8],r3[9]]

rects1 = plt.bar(
    index, Fit8, bar_width,
    alpha=opacity,
    color='g',
    label='Fit'
)

rects2 = plt.bar(
    index + 0.1, Boot8, bar_width,
    alpha=opacity,
    color='b',
    label='Boot'
)

rects3 = plt.bar(
    index + 0.2, Exp8, bar_width,
    alpha=opacity,
    color='y',
    label='EXP'
)

rects4 = plt.bar(
    index + 0.5, Fit16, bar_width,
    alpha=opacity,
    color='g'
)

rects5 = plt.bar(
    index + 0.6, Boot16, bar_width,
    alpha=opacity,
    color='b'
)

rects6 = plt.bar(
    index + 0.7, Exp16, bar_width,
    alpha=opacity,
    color='y'
)


plt.xticks(index + 0.2, (names))

plt.legend()
plt.tight_layout()
plt.show()
3
  • Could you provide an examples snippet of your database in a copy-able way? We do not have access to your data.xlsx sheet. Commented Jan 10, 2021 at 13:52
  • Does this answer your question? Commented Jan 10, 2021 at 14:45
  • not really. mine every size goup (like 8M) has a goup of strategies(S1,...,S5) and for each strategy a number of metrics (e.g.,Fit etc). It is abit complex Commented Jan 10, 2021 at 19:40

1 Answer 1

1

Something like this?

enter image description here

Here the code:

import pandas as pd
import pylab as plt

# read dataframe, take advantage of Multiindex
df = pd.read_excel(
    "data.xlsx",
    sheet_name="Sheet1", engine='openpyxl',
    index_col=[0, 1],
)
# plot the content of the dataframe
ax = df.plot.bar()

# Show minor ticks
ax.minorticks_on()

# Get location of the center of each bar
bar_locations = list(map(lambda x: x.get_x() + x.get_width() / 2., ax.patches))

# Set minor and major tick positions
# Minor are used for S1, ..., S5
# Major for sizes 8M, ..., 128M
# tick locations are sorted according to the 3 metrics, so first all the 25 bars for the fit, then the 25
# for the boot and at the end the 25 for the exp. We set the major tick at the position of the bar at the center
# of the size group, that is the third boot bar of each size.
ax.set_xticks(bar_locations[27:50:5], minor=False)  # use the 7th bar of each size group
ax.set_xticks(bar_locations[len(df):2 * len(df)], minor=True)  # use the bar in the middle of each group of 3 bars

# Labels for groups of 3 bars and for each group of size
ax.set_xticklabels(df.index.get_level_values(0)[::5], minor=False, rotation=0)
ax.set_xticklabels(df.index.get_level_values(1), minor=True, rotation=0)

# Set tick parameters
ax.tick_params(axis='x', which='major', pad=15, bottom='off')
ax.tick_params(axis='x', which='both', top='off')

# You can use a different color for each group
# You can comment out these lines if you don't like it
size_colors = 'rgbym'
# major ticks
for l, c in zip(ax.get_xticklabels(minor=False), size_colors):
    l.set_color(c)
    l.set_fontweight('bold')
# minor ticks
for i, l in enumerate(ax.get_xticklabels(minor=True)):
    l.set_color(size_colors[i // len(size_colors)])

# remove x axis label
ax.set_xlabel('')

plt.tight_layout()
plt.show()

The main idea here is to use the Multiindex of Pandas, with some minor tweaks.

EDIT If you want spaces between groups, you can add a dummy category (a.k.a strategy) in the dataframe to create an artificial space, obtaining:

enter image description here

Here the code:

import numpy as np
import pandas as pd
import pylab as plt

# read dataframe, take advantage of Multiindex
df = pd.read_excel(
    "data.xlsx",
    sheet_name="Sheet1", engine='openpyxl',
    index_col=[0, 1],
)
# plot the content of the dataframe
sizes = list(df.index.get_level_values(0).drop_duplicates())
strategies = list(df.index.get_level_values(1).drop_duplicates())
n_sizes = len(sizes)
n_strategies = len(strategies)
n_metrics = len(df.columns)

empty_rows = pd.DataFrame(
    data=[[np.nan] * n_metrics] * n_sizes, index=pd.MultiIndex.from_tuples([(s, 'SN') for s in sizes], names=df.index.names),
    columns=df.columns,
)

old_columns = list(df.columns)
df = df.merge(empty_rows, how='outer', left_index=True, right_index=True, sort=False).drop(
    columns=[f'{c}_y' for c in df.columns]
).sort_index(
    ascending=True, level=0, key=lambda x: sorted(x, key=lambda y: int(y[:-1]))
)
df.columns = old_columns

# Update number of strategies
n_strategies += 1

# Plot with Pandas
ax = df.plot.bar()

# Show minor ticks
ax.minorticks_on()

# Get location of the center of each bar
bar_locations = list(map(lambda x: x.get_x() + x.get_width() / 2., ax.patches))

# Set minor and major tick positions
# Major for sizes 8M, ..., 128M
# Minor are used for S1, ..., S5, SN
# Tick locations are sorted according to the 3 metrics, so first 30 (5 sizes * 6 strategies) bars for the fit,
# then 30 (5 sizes * 6 strategies) for the boot and at the end 30 (5 sizes * 6 strategies) for the exp.
# We set the major tick at the position of the bar at the center of the size group (+7),
# that is the third boot bar of each size.
n_bars_per_metric = n_sizes * n_strategies
strategy_ticks = bar_locations[len(df):2 * len(df)]
strategy_ticks = np.concatenate([strategy_ticks[b * n_strategies:b * n_strategies + n_strategies - 1] for b in range(n_sizes)])  # get only positions of the first 5 bars
size_ticks = strategy_ticks[2::n_sizes] + 0.01

ax.set_xticks(size_ticks, minor=False)  # use the 7th bar of each size group
ax.set_xticks(strategy_ticks, minor=True)  # use the bar in the middle of each group of 3 bars

# Labels for groups of 3 bars and for each group of size
ax.set_xticklabels(sizes, minor=False, rotation=0)
ax.set_xticklabels(strategies * n_sizes, minor=True, rotation=0)

# Set tick parameters
ax.tick_params(axis='x', which='major', pad=15, bottom=False)
ax.tick_params(axis='x', which='both', top=False)

# You can use a different color for each group
# You can comment out these lines if you don't like it
size_colors = 'rgbym'
# major ticks
for l, c in zip(ax.get_xticklabels(minor=False), size_colors):
    l.set_color(c)
    l.set_fontweight('bold')
# minor ticks
for i, l in enumerate(ax.get_xticklabels(minor=True)):
    l.set_color(size_colors[i // len(size_colors)])

# remove x axis label
ax.set_xlabel('')

plt.tight_layout()
plt.show()

As you can see, you have to play with the DataFrame, adding some extra code. Maybe there is a simpler solution, but it was the first that I can think of.

Sign up to request clarification or add additional context in comments.

4 Comments

thanks alot,, this is so good. but can we split the goup basedon size? for example having a partition between 8M and 16M? to be easy to read. Also, if we could add space between bars within the three metrics, for easy read and better look.
thanks that is clever! however i am countering an error "TypeError: sort_index() got an unexpected keyword argument 'key'"> why this is happening?
Try to update pandas. Which version are you using?
I updated pandas and it works fine.Thanks

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.