0

I am struggling with making a bokeh plot that shows the result of a grouped dataframe. The following is the issue.

I have some data from a dataframe:

data = pd.read_csv('CompanyStructure.csv', index_col = 0)

which looks as the following and contains thousands more rows:

enter image description here

I would like to visualize this dataframe after grouping by thee variables. It could as well be a grouping of two or one variables. Below I have provided an example where I group across all of the three first columns:

grouped = data.groupby(by=['hour', 'Code', 'Type']).sum()

And the frame looks as following:

enter image description here

Now I would like to visualize this. The following is my approach:

source = ColumnDataSource(data=grouped)
p = figure(x_range = source.data['hour_Code_Type'].tolist())
p.vbar(x='hour_Code_Type', top='Value', source=source)
show(p)

Then I get the following error:

ValueError: Unrecognized range input: '[(0, 'DK1', 'A'), (0, 'DK1', 'P'), (0, 'DK1', 'T'), (0, 'DK2', 'A'), (0, 'DK2', 'P'), (0, 'DK2', 'T'), (1, 'DK1', 'A'), (1, 'DK1', 'P'), (1, 'DK1', 'T'), (1, 'DK2', 'A'), (1, 'DK2', 'P'), (1, 'DK2', 'T'), (2, 'DK1', 'A'), (2, 'DK1', 'P'), (2, 'DK1', 'T'), (2, 'DK2', 'A'), (2, 'DK2', 'P'), (2, 'DK2', 'T'), (3, 'DK1', 'A'), (3, 'DK1', 'P'), (3, 'DK1', 'T'), (3, 'DK2', 'A'), (3, 'DK2', 'P'), (3, 'DK2', 'T'), (4, 'DK1', 'A'), (4, 'DK1', 'P'), (4, 'DK1', 'T') ...

I do understand the error, but I simply cannot figure out how to solve this. How can I make the x_range visualize a value as the shown once. My ideal tool is an interactive one (hence why I am using bokeh), which will make a bar chart depending on which variables are choosen to group with.

I hope that someone can help me out.

3
  • 1
    Bokeh categorical factors are always strings, so as a first action all those ints in the tuples need to be converted to string, for certain. Commented Aug 19, 2021 at 6:37
  • @bigreddot thanx a lot for your contribution. Unfortunately, this does not solve the issue. Commented Aug 19, 2021 at 9:24
  • I didn't say it would, merely that that was one thing that was definitely wrong (possiby among others). Without a Minimal Reproducible Example it's hard to say more. If you had provided an MRE I would have run it and fixed it directly. Commented Aug 19, 2021 at 18:51

1 Answer 1

1

EDIT: As bigreddot pointed out, FactorRange could be used to avoid having string tuples as categoricals.

from bokeh.models import FactorRange

df['hour'] = df['hour'].astype(str) # To get Tuple(String, String, String) when grouping

grouped = df.groupby(by=['hour', 'Code', 'Type']).sum()
source = ColumnDataSource(data=grouped)

p = figure(x_range=FactorRange(*source.data['hour_Code_Type'].tolist()))
p.vbar(x='hour_Code_Type', top='Value', source=source)
show(p)

This renders

enter image description here

Old answer:

When you group on 'hour', 'Code' and 'Type' you are creating a MultiIndex. As you want categoricals in the x-range to be a list of strings, one approach could be to create a new column that converts the MultiIndex to a string.

grouped = df.groupby(by=['hour', 'code', 'type']).sum()
grouped['group'] = [''.join(str(x)) for x in grouped.index]

This gives this as the dataframe (using mock-data):

                value            group
hour code type
0    DK2  A       0.4  (0, 'DK2', 'A')
1    DK1  B       1.0  (1, 'DK1', 'B')
2    DK1  A       1.5  (2, 'DK1', 'A')

Then you could visualize value based on the 'group' column:

source = ColumnDataSource(data=grouped)

p = figure(x_range=source.data['group'].tolist())
p.vbar(x='group', top='value', source=grouped)
show(p)

To get:

enter image description here

Sign up to request clarification or add additional context in comments.

3 Comments

Thanx @taul. So the trick is to create a new column identical to the multi index rather than using the multiindex it self?
Note that Bokeh can handle hierarchical (multi-part) factors and will render nested axes for them: docs.bokeh.org/en/latest/docs/user_guide/… It is not necessary to settle for the ugly tuple axis tick labels.
Thanks bigreddot, I edited my answer. Blindly forgot about FactorRange.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.