0

I want to create multiple (two in this case) boxplots based on a data in a dataframe

I have the following dataframe:

    Country   Fund                                   R^2            Style
0   Austria  BG EMCore Convertibles Global CHF R T   0.739131   Allocation
1   Austria  BG EMCore Convertibles Global R T       0.740917   Allocation
2   Austria  BG Trend A T                            0.738376   Fixed Income
3   Austria  Banken Euro Bond-Mix A                  0.71161    Fixed Income
4   Austria  Banken KMU-Fonds T                      0.778276   Allocation
5   Brazil   Banken Nachhaltigkeitsfonds T           0.912808   Allocation
6   Brazil   Banken Portfolio-Mix A                  0.857019   Allocation
7   Brazil   Banken Portfolio-Mix T                  0.868856   Fixed Income
8   Brazil   Banken Sachwerte-Fonds T                0.730626   Fixed Income
9   Brazil   Banken Strategie Wachstum T             0.918684   Fixed Income

I want to create a boxplot chart for each country summarized by Style and showing the distribution of R^2. I was thinking of groupby operation but somehow I don't manage to make two charts for each country.

Thanks in advance

2
  • how the data shell by grouped? Only by Country or by Country and Style? Commented Aug 13, 2019 at 10:36
  • I guess by country and style. For each country one boxplot chart consisting of two bars for style - because we have Allocation and Fixed Income. Hope this answers Commented Aug 13, 2019 at 10:40

3 Answers 3

2

Here You go. Description in code.

=^..^=

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from io import StringIO

data = StringIO("""
Country R^2 Style
Austria 0.739131 Allocation
Austria 0.740917 Allocation
Austria 0.738376 Fixed_Income
Austria 0.71161 Fixed_Income
Austria 0.778276 Allocation
Brazil 0.912808 Allocation
Brazil 0.857019 Allocation
Brazil 0.868856 New_Style
Brazil 0.730626 Fixed_Income
Brazil 0.918684 Fixed_Income
Brazil 0.618684 New_Style
""")

# load data into data frame
df = pd.read_csv(data, sep=' ')

# group data by Country
grouped_data = df.groupby(['Country'])

# create list of grouped data frames
df_list = []
country_list = []
for item in list(grouped_data):
    df_list.append(item[1])
    country_list.append(item[0])

# plot box for each Country
for df in df_list:
    country = df['Country'].unique()
    df = df.drop(['Country'], axis=1)
    df = df[['Style', 'R^2']]
    columns_names = list(set(df['Style']))
    # pivot rows into columns
    df = df.assign(g = df.groupby('Style').cumcount()).pivot('g','Style','R^2')
    # plot box
    df.boxplot(column=colums_names)
    plt.title(country[0])
    plt.show()

Output:

enter image description here enter image description here

Sign up to request clarification or add additional context in comments.

3 Comments

Thanks Zaraki. I think it will work. I didn't specify this but there are many countries and multiple styles - for example equity, fixed income, allocation etc. Therefore in df.boxplot(column=['Allocation', 'Fixed_Income', Equity', etc etc]) I input all the styles. Yet for some countries not all styles apply. So when the code finds a country with less styles than the specified in the list it gives an error. Do you know how I can tackle this. Maybe somehow to specify in df.boxplot(column=['Allocation', 'Fixed_Income' etc.]) that if some of the styles is not found to raise exception.
@MartinYordanovGeorgiev I updated my code with line: column_names. Now it should handle different styles.
Thanks Zaraki, works just fine. Much appreciated. Just corrected one typo in df.boxplot(column=colums_names) "n" is omitted from colums_names. I posted an alternative answer myself. You can check below if interested.
1

Came up with some solution myself.

df= "This is the table from the original question"   

uniquenames=df.Country.unique()

# create dictionary of the data with countries set as keys
diction={elem:pd.DataFrame for elem in uniquenames}

# fill dictionary with values
for key in diction.keys():
diction[key]=df[:][df.Country==key]

#plot the data
for i in diction.keys():
diction[i].boxplot(column="R^2",by="Style",
                   figsize=(15,6),patch_artist=True,fontsize=12)
plt.xticks(rotation=90)
plt.title(i,fontsize=12)

Comments

0

Use seaborn for this kind of tasks. Here are a couple of options:

Use seaborn's boxplot

import seaborn as sns
sns.set()

# Note - the data is stored in a data frame df
sns.boxplot(x='Country', y='R^2', hue='Style', data=df)

enter image description here

Alternatively, you can use seaborn's FacetGrid.

g = sns.FacetGrid(df, col="Country",  row="Style")
g = g.map(sns.boxplot, 'R^2', orient='v')

enter image description here

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.