Python Pandas/Matplot - Annotating values above and below a line

Question

I have a dataframe of values which I am using to plot a scatter/line graph with confidence intervals:

The dataframe (sqlDF2) is like this:

Statu   Total   count   Success   Pred   Upper95    Lower95      Upper99    Lower99
Org                             
A        391    391       38    0.35064  0.398903   0.302377    0.423034    0.278245
B        360    360       30    0.343464 0.393519   0.293408    0.418546    0.268381
C        271    271       29    0.319606 0.37626    0.262951    0.404587    0.234624
D        247    247       22    0.312089 0.371053   0.253125    0.400535    0.223643
...

The code that I plot the graph is:

y = sqlDf2['Success'].values
x = sqlDf2['Total'].values

up95 = (sqlDf2['Upper95'].values)*100
low95 = (sqlDf2['Lower95'].values)*100
up99 = (sqlDf2['Upper99'].values)*100
low99 = (sqlDf2['Lower99'].values)*100
middleLine = (sqlDf2['Pred'].values)*100

plt.figure(figsize=(15,8))
plt.ylim(0, 100)
plt.margins(x=0)

plt.scatter(x,y,marker='o',c='white',edgecolors = 'black', alpha=.5)
plt.plot(x,up95, 'red', linestyle=':', dashes=(1, 5), linewidth=1)
plt.plot(x,low95, 'red', linestyle=':', dashes=(1, 5), linewidth=1)
plt.plot(x,up99, 'red', linestyle=':', dashes=(1, 5), linewidth=1)
plt.plot(x,low99, 'red', linestyle=':', dashes=(1, 5), linewidth=1)
plt.plot(x,middleLine, 'red', linestyle='-', dashes=(1, 2), linewidth=1)

plt.show()

The graph looks like this:

What I want to do is annotate the values that fall ABOVE and BELOW the 99% confidence intervals with the value of 'Org'. Is there a easy way to work out those values which fall above and below two lines in Python?

Thank you

pyman · Accepted Answer · 2018-01-30 11:49:49Z

1

In your DataFrame you have the y-values of the data-points and the y-values of the lines in a single line. Therefore, you could use np.where for this purpose.

C = np.where(condition, A, B)

A is set if the condition is True and B if the condition is False. If you want to check against the Upper99 and Lower99 lines you could achieve this as follows:

sqlDF2['Outside'] = np.where((sqlDf2['Success'] > sqlDf2['Upper99']*100) | (sqlDf2['Success']<sqlDf['Lower99']*100), True, False)

This will result in a new column containing True if the data-point lies outside of the given boundaries and False if it is inside of the boundaries.

edited Jan 30, 2018 at 11:49

answered Jan 30, 2018 at 10:54

pyman

14412 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

Nicholas Over a year ago

Hey, thank you. I am getting ''unsupported operand type(s) for |: 'float' and 'int'", but I think that will be fixed by converting one of the columns to an int or float. Thank you so much for your help! :)

Nicholas Over a year ago

Hmmm, getting a 'TypeError: unsupported operand type(s) for |: 'float' and 'bool'' after fixing the float issue. Any idea?

Nicholas Over a year ago

Fixed it with some brackets! sqlDf2['Outside'] = np.where((sqlDf2['Success'] > sqlDf2['Upper99']*100) | (sqlDf2['Success']<sqlDf2['Lower99']*100), True, False)

pyman Over a year ago

I just added the brackets in my answer to make it work properly

Collectives™ on Stack Overflow

Python Pandas/Matplot - Annotating values above and below a line

1 Answer 1

4 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

4 Comments

Your Answer

Sign up or log in

Post as a guest

Related