0

I have table data available in following format :

id  value    valid
1   0.43323  true
2   0.83122  false
3   0.33132  true
4   0.58351  false
5   0.74143  true
6   0.44334  true
7   0.86436  false
8   0.73555  true
9   0.56534  false
10  0.66234  true
...

I am trying to plot a histogram like this one

enter image description here

Wanted to know if there is a way to do it in panda dataframe to group numeric values from .0 to .1 then .1 to .2 and so on to represent data like presented in image with color coding the bar with true and false count separately.

I am thinking to create separate slices in a dictionary and then count true/false value separately. Later I can create a histogram with this. Is there a better way to plot such histogram without doing all these calculations?

What I have so far with bin and cut:

new_df = df[['value','valid']]
bins = [0, .1, .2, .3, .4, .5, .6, .7, .8, .9, 1]
s = new_df.groupby(pd.cut(new_df['value'], bins=bins)).size()
s.plot(kind='bar', stacked=True)

With this i am able to get total count histogram with bins, I am not able to apply the color coding of 'valid' column true/false count for each bar.

5
  • 1
    If you disagree with the closure of your question: Panda dataframe : plot histogram with grouping there is a process to reopen a question and it is decidedly not deleting and reposting the same question. Commented Aug 13, 2021 at 18:25
  • Let me follow that, it asked me to repost the question. The post are couple of year old and i would like to know if there are better ways to combine both questions using new python libraries Commented Aug 13, 2021 at 18:27
  • You might consider including the duplicates that were linked and explain why they do not apply or what you are looking for that differs. The duplicates in question are Binning a column with Python Pandas and Pandas - Plotting a stacked Bar Chart for those without the ability to see deleted questions. Commented Aug 13, 2021 at 18:29
  • Thanks Henry for pointing out the questions. As mentioned earlier these two questions talk about binning, groupby and count separately. The idea with this question is to combine both solution together that I am getting hard time getting around with. i.e : I can generate bins and plot the histogram with it but not able to color code it with separate true/false counts Commented Aug 13, 2021 at 18:39
  • Let me add the code i have so far Commented Aug 13, 2021 at 18:49

1 Answer 1

2

Try:

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

np.random.seed(123)

df = pd.DataFrame(
    {
        "value": np.random.random(1000),
        "valid": np.random.choice([True, False], p=[0.7, 0.3], size=1000),
    }
)

df["label"] = pd.cut(df["value"], bins=np.arange(0, 1.01, 0.1))

ax = (
    df.groupby(["label", "valid"])
    .count()
    .unstack()["value"]
    .plot.bar(stacked=True, rot=0, figsize=(10, 7))
)
ax.legend(loc="upper center")
ax.spines["right"].set_visible(False)
ax.spines["top"].set_visible(False)
_ = ax.set_ylim(0, 150)

Output:

enter image description here

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.