0

I would like to plot 8 subplots for each rescued dog race.


rescued_pop = 8
rescued_dog_race = ["rottweiler", "poodle", "pitbull", "chihuahua", "shihtzu", "whippet", "terrier", "greyhound"]
rescued = float(df.loc[df.breed == rescued_dog_race,"rescued"].to_string(index=False))

for j in range(7):    
    for i in range(1000):
        simulated_dog_race = np.random.choice(["yes", "no"], size = 100, p=[rescued[j]/100,1-(rescued[j]/100)])
        num_rescued = np.sum(simulated_dog_race == "yes")
        null_outcomes.append(num_rescued)
        print(i,rescued[j],null_outcomes)
    print(i,rescued[j],null_outcomes) 

The dataframe looks like that:

enter image description here

Currently I can generate a bar plot for a define "rescued_dog_race".

np.random.seed(1)

rescued_pop = 8
rescued_dog_race = "whippet"
rescued = float(df.loc[df.breed == rescued_dog_race,"rescued"].to_string(index=False))

null_outcomes = []
null_outcomes_pop = []

for i in range(1000):
        simulated_dog_race = np.random.choice(["yes", "no"], size = 100, p=[rescued/100,1-(rescued/100)])
        num_rescued = np.sum(simulated_dog_race == "yes")
        null_outcomes.append(num_rescued)
             
for i in range(1000):
        simulated_pop = np.random.choice(["yes", "no"], size = 100, p=[rescued_pop/100,1-(rescued_pop/100)])
        num_rescued_pop = np.sum(simulated_pop == "yes")
        null_outcomes_pop.append(num_rescued_pop)
       

enter image description here

4
  • Hi, and welcome to SO. What's rescued? And what exactly is your question? Are you asking how to get the number of yes answers? Are you asking how to plot a barplot of a list containing ints? Could you provide more code of what you tried? Commented Oct 14, 2022 at 19:03
  • Does this answer your question? populating matplotlib subplots through a loop and a function Commented Oct 15, 2022 at 3:16
  • OP says "I would like to plot a histogram for each j (so 8 subplots)" and their code is "for j in range(7):" — oh well, Python does not work like that… Commented Oct 15, 2022 at 15:11
  • I can make a barplot with rescued_dog_race = "whippet" for example. But I am wondering if I could avoid changing 8 times the input value in order to get 8 bar plots at once... Commented Oct 15, 2022 at 17:45

1 Answer 1

0

Some ideas to simplify your solution:

Remove the for-loop and define the random size directly Instead of using a loop until m=1000 with random arrays of size n=100, define the size directly via size=(m,n)

simulated = np.random.choice(["yes", "no"], size = (1000,100), p=[prop/100,1-(prop/100)])
null_outcomes = list(np.sum(simulated=='yes',axis=1))

Move calculation to a separate function and do something with every breed via df.apply, e.g.

def get_null_outcomes(prop):
    simulated = np.random.choice(["yes", "no"], size = (1000,100), p=[prop/100,1-(prop/100)])
    return list(np.sum(simulated=='yes',axis=1))

df.apply(lambda x: get_null_outcomes(x['rescued']), axis=1)

Consequently, you can plot the resulting data in a bar chart, store it in a separate column of your df.

Complete Example (based on your code)

import pandas as pd

np.random.seed(1)

df = pd.DataFrame({
    'breed': ['rottweiler', 'poodle', 'pitbull', 'chihuahua', 'shihtzu', 'whippet', 'terrier', 'greyhound'],
    'not_rescued': [85, 86, 87, 90, 93, 94, 96, 97],
    'rescued': [16, 14, 13, 10, 7, 6, 4, 3]
})

def get_null_outcomes(prop):
    simulated = np.random.choice(["yes", "no"], size = (1000,100), p=[prop/100,1-(prop/100)])
    return list(np.sum(simulated=='yes',axis=1))

#pop data
rescued_pop = 8
null_outcomes_pop = get_null_outcomes(rescued_pop)

#plot pop data
plt.hist(null_outcomes_pop, color='orange', stacked=True, alpha=0.2)
plt.axvline(rescued_pop, color='g', label='expected dog rescued')
plt.axvline( np.percentile(null_outcomes_pop, 2.5), color='purple', linestyle='dashed', label='2.5% percentile' )
plt.axvline( np.percentile(null_outcomes_pop, 97.5), color='purple', linestyle='dashed', label='97.5% percentile' )

#dog race data
df['null_outcomes'] = df.apply(lambda x: get_null_outcomes(x['rescued']), axis=1)

#plot dog race data (e.g. for whippet)
plt.hist(df['null_outcomes'][5], histtype='step', fill=False, label=df['breed'][5])
plt.axvline(df['rescued'][5], linestyle='dashed')

plt.legend()
plt.show()

Some more hints

Plot all at once instead of plotting a bar chart for one race only

plt.hist(df['null_outcomes'], histtype='step', fill=False, label=df['breed'])

Plot both the histogramm and the vertical line for all with the same color via

cm = plt.cm.RdBu_r
for i in range(len(df['breed'])):
    c = color=cm(i/len(df['breed']))
    plt.hist(df['null_outcomes'][i], color=c, histtype='step', fill=False, label=df['breed'][i])
    plt.axvline(df['rescued'][i], color=c, linestyle='dashed')
Sign up to request clarification or add additional context in comments.

7 Comments

Thank for the answer ! I see but now I am realizing that I have trouble following the data structure that I should used .I got the following error: p' must be 1-dimensional.
how p can be a one dimensional array and actually why ?
I added a complete example for clarification. In addition, may look into np.random.normal instead of calculating a random normal distribution yourself
I got it ! Thank a lot, merci.
I have a general question. So in Python it looks like it is better to work with the index of a row instead of the element contains in the first columns for example ? Is it right ? In my case, I would prefer use something like df[breed = whippet]
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.