Create multiple boxplots from dataframe

Question

I want to create multiple (two in this case) boxplots based on a data in a dataframe

I have the following dataframe:

    Country   Fund                                   R^2            Style
0   Austria  BG EMCore Convertibles Global CHF R T   0.739131   Allocation
1   Austria  BG EMCore Convertibles Global R T       0.740917   Allocation
2   Austria  BG Trend A T                            0.738376   Fixed Income
3   Austria  Banken Euro Bond-Mix A                  0.71161    Fixed Income
4   Austria  Banken KMU-Fonds T                      0.778276   Allocation
5   Brazil   Banken Nachhaltigkeitsfonds T           0.912808   Allocation
6   Brazil   Banken Portfolio-Mix A                  0.857019   Allocation
7   Brazil   Banken Portfolio-Mix T                  0.868856   Fixed Income
8   Brazil   Banken Sachwerte-Fonds T                0.730626   Fixed Income
9   Brazil   Banken Strategie Wachstum T             0.918684   Fixed Income

I want to create a boxplot chart for each country summarized by Style and showing the distribution of R^2. I was thinking of groupby operation but somehow I don't manage to make two charts for each country.

Thanks in advance

how the data shell by grouped? Only by Country or by Country and Style? — Zaraki Kenpachi
– Zaraki Kenpachi, Commented Aug 13, 2019 at 10:36
I guess by country and style. For each country one boxplot chart consisting of two bars for style - because we have Allocation and Fixed Income. Hope this answers — Martin Yordanov Georgiev
– Martin Yordanov Georgiev, Commented Aug 13, 2019 at 10:40

Zaraki Kenpachi · Accepted Answer · 2019-08-14 05:53:31Z

2

Here You go. Description in code.

=^..^=

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from io import StringIO

data = StringIO("""
Country R^2 Style
Austria 0.739131 Allocation
Austria 0.740917 Allocation
Austria 0.738376 Fixed_Income
Austria 0.71161 Fixed_Income
Austria 0.778276 Allocation
Brazil 0.912808 Allocation
Brazil 0.857019 Allocation
Brazil 0.868856 New_Style
Brazil 0.730626 Fixed_Income
Brazil 0.918684 Fixed_Income
Brazil 0.618684 New_Style
""")

# load data into data frame
df = pd.read_csv(data, sep=' ')

# group data by Country
grouped_data = df.groupby(['Country'])

# create list of grouped data frames
df_list = []
country_list = []
for item in list(grouped_data):
    df_list.append(item[1])
    country_list.append(item[0])

# plot box for each Country
for df in df_list:
    country = df['Country'].unique()
    df = df.drop(['Country'], axis=1)
    df = df[['Style', 'R^2']]
    columns_names = list(set(df['Style']))
    # pivot rows into columns
    df = df.assign(g = df.groupby('Style').cumcount()).pivot('g','Style','R^2')
    # plot box
    df.boxplot(column=colums_names)
    plt.title(country[0])
    plt.show()

Output:

edited Aug 14, 2019 at 5:53

answered Aug 13, 2019 at 12:20

Zaraki Kenpachi

5,7702 gold badges17 silver badges40 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Martin Yordanov Georgiev Over a year ago

Thanks Zaraki. I think it will work. I didn't specify this but there are many countries and multiple styles - for example equity, fixed income, allocation etc. Therefore in df.boxplot(column=['Allocation', 'Fixed_Income', Equity', etc etc]) I input all the styles. Yet for some countries not all styles apply. So when the code finds a country with less styles than the specified in the list it gives an error. Do you know how I can tackle this. Maybe somehow to specify in df.boxplot(column=['Allocation', 'Fixed_Income' etc.]) that if some of the styles is not found to raise exception.

Zaraki Kenpachi Over a year ago

@MartinYordanovGeorgiev I updated my code with line: column_names. Now it should handle different styles.

Martin Yordanov Georgiev Over a year ago

Thanks Zaraki, works just fine. Much appreciated. Just corrected one typo in df.boxplot(column=colums_names) "n" is omitted from colums_names. I posted an alternative answer myself. You can check below if interested.

Martin Yordanov Georgiev · Accepted Answer · 2019-08-14 11:39:09Z

1

Came up with some solution myself.

df= "This is the table from the original question"   

uniquenames=df.Country.unique()

# create dictionary of the data with countries set as keys
diction={elem:pd.DataFrame for elem in uniquenames}

# fill dictionary with values
for key in diction.keys():
diction[key]=df[:][df.Country==key]

#plot the data
for i in diction.keys():
diction[i].boxplot(column="R^2",by="Style",
                   figsize=(15,6),patch_artist=True,fontsize=12)
plt.xticks(rotation=90)
plt.title(i,fontsize=12)

answered Aug 14, 2019 at 11:39

Martin Yordanov Georgiev

4774 silver badges15 bronze badges

Comments

KRKirov · Accepted Answer · 2019-08-13 15:12:00Z

0

Use seaborn for this kind of tasks. Here are a couple of options:

Use seaborn's boxplot

import seaborn as sns
sns.set()

# Note - the data is stored in a data frame df
sns.boxplot(x='Country', y='R^2', hue='Style', data=df)

Alternatively, you can use seaborn's FacetGrid.

g = sns.FacetGrid(df, col="Country",  row="Style")
g = g.map(sns.boxplot, 'R^2', orient='v')

edited Aug 13, 2019 at 15:12

answered Aug 13, 2019 at 15:06

KRKirov

4,0142 gold badges20 silver badges24 bronze badges

Collectives™ on Stack Overflow

Create multiple boxplots from dataframe

3 Answers 3

3 Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

3 Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related