2

I have a dataframe of values which I am using to plot a scatter/line graph with confidence intervals:

The dataframe (sqlDF2) is like this:

Statu   Total   count   Success   Pred   Upper95    Lower95      Upper99    Lower99
Org                             
A        391    391       38    0.35064  0.398903   0.302377    0.423034    0.278245
B        360    360       30    0.343464 0.393519   0.293408    0.418546    0.268381
C        271    271       29    0.319606 0.37626    0.262951    0.404587    0.234624
D        247    247       22    0.312089 0.371053   0.253125    0.400535    0.223643
...

The code that I plot the graph is:

y = sqlDf2['Success'].values
x = sqlDf2['Total'].values

up95 = (sqlDf2['Upper95'].values)*100
low95 = (sqlDf2['Lower95'].values)*100
up99 = (sqlDf2['Upper99'].values)*100
low99 = (sqlDf2['Lower99'].values)*100
middleLine = (sqlDf2['Pred'].values)*100

plt.figure(figsize=(15,8))
plt.ylim(0, 100)
plt.margins(x=0)

plt.scatter(x,y,marker='o',c='white',edgecolors = 'black', alpha=.5)
plt.plot(x,up95, 'red', linestyle=':', dashes=(1, 5), linewidth=1)
plt.plot(x,low95, 'red', linestyle=':', dashes=(1, 5), linewidth=1)
plt.plot(x,up99, 'red', linestyle=':', dashes=(1, 5), linewidth=1)
plt.plot(x,low99, 'red', linestyle=':', dashes=(1, 5), linewidth=1)
plt.plot(x,middleLine, 'red', linestyle='-', dashes=(1, 2), linewidth=1)

plt.show() 

The graph looks like this:

enter image description here

What I want to do is annotate the values that fall ABOVE and BELOW the 99% confidence intervals with the value of 'Org'. Is there a easy way to work out those values which fall above and below two lines in Python?

Thank you

1 Answer 1

1

In your DataFrame you have the y-values of the data-points and the y-values of the lines in a single line. Therefore, you could use np.where for this purpose.

C = np.where(condition, A, B)

A is set if the condition is True and B if the condition is False. If you want to check against the Upper99 and Lower99 lines you could achieve this as follows:

sqlDF2['Outside'] = np.where((sqlDf2['Success'] > sqlDf2['Upper99']*100) | (sqlDf2['Success']<sqlDf['Lower99']*100), True, False)

This will result in a new column containing True if the data-point lies outside of the given boundaries and False if it is inside of the boundaries.

Sign up to request clarification or add additional context in comments.

4 Comments

Hey, thank you. I am getting ''unsupported operand type(s) for |: 'float' and 'int'", but I think that will be fixed by converting one of the columns to an int or float. Thank you so much for your help! :)
Hmmm, getting a 'TypeError: unsupported operand type(s) for |: 'float' and 'bool'' after fixing the float issue. Any idea?
Fixed it with some brackets! sqlDf2['Outside'] = np.where((sqlDf2['Success'] > sqlDf2['Upper99']*100) | (sqlDf2['Success']<sqlDf2['Lower99']*100), True, False)
I just added the brackets in my answer to make it work properly

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.