Selecting columns using query statement in pandas dataframe

Question

I have the following pandas dataframe. Each point is combined with 'n' class points of each class, and each combination has a value of 0 or 1. Now for each point, I want to get the class which has the highest number of '0'. Output : Pt.1 - a Pt.2 -b

I have tried with hash table, but its being a bit cumbersome. What can be an elegant pandas dataframe query for this?

+------+-------+-------+--+--+--+
| Pt.  | class | value |  |  |  |
+------+-------+-------+--+--+--+
| Pt.1 | a     |     0 |  |  |  |
| Pt.1 | a     |     0 |  |  |  |
| Pt.1 | a     |     1 |  |  |  |
| Pt.1 | b     |     0 |  |  |  |
| Pt.1 | b     |     1 |  |  |  |
| pt.1 | b     |     1 |  |  |  |
| Pt.2 | a     |     1 |  |  |  |
| Pt.2 | a     |     1 |  |  |  |
| Pt.2 | a     |     1 |  |  |  |
| Pt.2 | b     |     0 |  |  |  |
| Pt.2 | b     |     0 |  |  |  |
| Pt.2 | b     |     0 |  |  |  |
|      |       |       |  |  |  |
+------+-------+-------+--+--+--+

Why is the r tag here?

Sotos
– Sotos

2017-11-15 12:45:45 +00:00
Commented Nov 15, 2017 at 12:45 — Sotos
– Sotos, Commented Nov 15, 2017 at 12:45
because dataframe operations are similar in r and python

rj dj
– rj dj

2017-11-15 13:15:36 +00:00
Commented Nov 15, 2017 at 13:15 — rj dj
– rj dj, Commented Nov 15, 2017 at 13:15

jezrael · Accepted Answer · 2017-11-15 12:49:56Z

1

First filter only 0 rows by boolean indexing and then count by groupby with value_counts which sorts output, so is necessary seelct first index value by indexing:

df = (df[df['value'] == 0].groupby('Pt.')['class']
                          .apply(lambda x: x.value_counts().index[0])
                          .reset_index(name='top1'))
print (df)
    Pt. top1
0  Pt.1    a
1  Pt.2    b

Similar alternative with query for filtering:

df = (df.query("value == 0")
        .groupby('Pt.')['class']
        .apply(lambda x: x.value_counts().index[0])
        .reset_index(name='top1'))
print (df)
    Pt. top1
0  Pt.1    a
1  Pt.2    b

edited Nov 15, 2017 at 12:49

answered Nov 15, 2017 at 12:43

jezrael

868k103 gold badges1.4k silver badges1.3k bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

rj dj Over a year ago

Thanks! Worked perfectly!

Collectives™ on Stack Overflow

Selecting columns using query statement in pandas dataframe

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related