4

I am trying to return a column name and index based on the item value. I have something like this:

enter image description here

So let's day I am trying to return index and column names of all values where value is > 0.75.

for date, row in df.iterrows():
    for item in row:
        if item > .75:
            print index, row

I wanted this to return "traffic and robbery". However this returns all the values. I did not find answer to this in documentation, online or here. Thank you in advance.

1
  • For future reference, just paste df.head() into the question instead of using a screenshot. We can copy & paste the dataframe into our consoles that way.. Commented Aug 19, 2014 at 8:42

3 Answers 3

5

Using slightly different numbers (for no particular reason), you can stack to for a Series and then use boolean indexing:

In [11]: df.stack()
Out[11]:
assault  assault    1.00
         robbery    0.76
         traffic    0.60
robbery  assault    0.76
         robbery    1.00
         traffic    0.78
traffic  assault    0.68
         robbery    0.78
         traffic    1.00
dtype: float64

In [12]: s = df.stack()

In [13]: s[(s!=1) & (s>0.77)]
Out[13]:
robbery  traffic    0.78
traffic  robbery    0.78
dtype: float64

You can do a bit of numpy to remove the duplicates, one way* is to 0 those not in the upper diagonal with triu (unfortunately this doesn't return as a DataFrame :( ):

In [21]: np.triu(df, 1)
Out[21]:
array([[ 0.  ,  0.76,  0.6 ],
       [ 0.  ,  0.  ,  0.78],
       [ 0.  ,  0.  ,  0.  ]])

In [22]: s = pd.DataFrame(np.triu(df, 1), df.index, df.columns).stack() > 0.77

In [23]: s[s]
Out[23]:
robbery  traffic    True
dtype: bool

In [24]: s[s].index.tolist()
Out[24]: [('robbery', 'traffic')]

*I suspect there are more efficient ways...

Sign up to request clarification or add additional context in comments.

Comments

1

I start with

         assault  robbery  traffic
index                             
assault     1.00     0.74     0.68
robbery     0.74     1.00     0.78
traffic     0.68     0.78     1.00

and do

df = df.reset_index()
df2 = df.stack().reset_index()
df2.drop_duplicates(0)[df2[0] > 0.75][['index', 'level_1']]

     index  level_1
0  assault  assault
5  robbery  traffic

Where drop_duplicates() gets rid of double key-pairs, but assumes that every key-pair has a unique value (which is debatable).

Comments

1

If you want to keep the for loops, you could use columns and index:

for i in df.index:
  for j in df.columns:
    if (i != j) and (df[i][j] > 0.75):
      print(i,j)

The output would then be:

robbery traffic
traffic robbery

Update: As FooBar pointed out, it is inefficient. Better use something like FooBar and Andy Hayden suggested:

In [3]: df[(df>0.75) & (df!=1)].stack().drop_duplicates()
Out[3]: robbery  traffic    0.78
        dtype: float64

1 Comment

Keeping for loops is a very inefficient (and ugly way) to do this.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.