Pandas return index and column name based on the item value

Question

I am trying to return a column name and index based on the item value. I have something like this:

enter image description here

So let's day I am trying to return index and column names of all values where value is > 0.75.

for date, row in df.iterrows():
    for item in row:
        if item > .75:
            print index, row

I wanted this to return "traffic and robbery". However this returns all the values. I did not find answer to this in documentation, online or here. Thank you in advance.

For future reference, just paste df.head() into the question instead of using a screenshot. We can copy & paste the dataframe into our consoles that way.. — FooBar
– FooBar, Commented Aug 19, 2014 at 8:42

Andy Hayden · Accepted Answer · 2014-08-19 06:35:58Z

Using slightly different numbers (for no particular reason), you can stack to for a Series and then use boolean indexing:

In [11]: df.stack()
Out[11]:
assault  assault    1.00
         robbery    0.76
         traffic    0.60
robbery  assault    0.76
         robbery    1.00
         traffic    0.78
traffic  assault    0.68
         robbery    0.78
         traffic    1.00
dtype: float64

In [12]: s = df.stack()

In [13]: s[(s!=1) & (s>0.77)]
Out[13]:
robbery  traffic    0.78
traffic  robbery    0.78
dtype: float64

You can do a bit of numpy to remove the duplicates, one way* is to 0 those not in the upper diagonal with triu (unfortunately this doesn't return as a DataFrame :( ):

In [21]: np.triu(df, 1)
Out[21]:
array([[ 0.  ,  0.76,  0.6 ],
       [ 0.  ,  0.  ,  0.78],
       [ 0.  ,  0.  ,  0.  ]])

In [22]: s = pd.DataFrame(np.triu(df, 1), df.index, df.columns).stack() > 0.77

In [23]: s[s]
Out[23]:
robbery  traffic    True
dtype: bool

In [24]: s[s].index.tolist()
Out[24]: [('robbery', 'traffic')]

*I suspect there are more efficient ways...

FooBar · Accepted Answer · 2014-08-19 08:53:40Z

1

I start with

         assault  robbery  traffic
index                             
assault     1.00     0.74     0.68
robbery     0.74     1.00     0.78
traffic     0.68     0.78     1.00

and do

df = df.reset_index()
df2 = df.stack().reset_index()
df2.drop_duplicates(0)[df2[0] > 0.75][['index', 'level_1']]

     index  level_1
0  assault  assault
5  robbery  traffic

Where drop_duplicates() gets rid of double key-pairs, but assumes that every key-pair has a unique value (which is debatable).

answered Aug 19, 2014 at 8:53

FooBar

16.7k20 gold badges94 silver badges188 bronze badges

Comments

segmentationfault · Accepted Answer · 2014-08-19 13:48:52Z

1

If you want to keep the for loops, you could use columns and index:

for i in df.index:
  for j in df.columns:
    if (i != j) and (df[i][j] > 0.75):
      print(i,j)

The output would then be:

robbery traffic
traffic robbery

Update: As FooBar pointed out, it is inefficient. Better use something like FooBar and Andy Hayden suggested:

In [3]: df[(df>0.75) & (df!=1)].stack().drop_duplicates()
Out[3]: robbery  traffic    0.78
        dtype: float64

edited Aug 19, 2014 at 13:48

answered Aug 19, 2014 at 7:02

segmentationfault

5183 silver badges7 bronze badges

1 Comment

FooBar Over a year ago

Keeping for loops is a very inefficient (and ugly way) to do this.

Collectives™ on Stack Overflow

Pandas return index and column name based on the item value

3 Answers 3

Comments

Comments

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related