Adding rows of one DataFrame with same index

Question

Here is a minimal example:

import pandas as pd
df = pd.DataFrame({'x': [0, 0, np.NaN, 1], 'y': [1, 0, 0, np.NaN], 'z': [np.NaN, 1, 1, 0]}, dtype = int, index = ['a', 'a', 'b', 'b'])

       x      y      z
a      0      1    NaN
a      0      0      1
b    NaN      0      1
b      1    NaN      0

Values can only be 0, 1, or NaN. I want to add rows that have the same index, ignoring NaN values. The result would be here:

       x      y      z
a      0      1      1 
b      1      0      1

The way I am doing it:

df.max(level = 0)

Is there a faster way?

jezrael · Accepted Answer · 2018-02-21 11:38:44Z

1

It is same, performace should be similar - mainly it depends of data:

df.groupby(level = 0).max()

Time comparison:

In [15]: %timeit df.groupby(level = 0).max()
    ...: 
100 loops, best of 3: 8.08 ms per loop
In [12]: %timeit df.max(level = 0)
    ...: 
100 loops, best of 3: 8.04 ms per loop

Some bigger data:

N = 100000
idx = np.random.randint(10000, size=N).astype(str)
df = pd.DataFrame(np.random.choice([0,1,np.nan], size=(N,3)), index=idx)
df = df.sort_index()
print (df.head())

In [174]: %timeit df.max(level = 0)
100 loops, best of 3: 19.5 ms per loop

In [175]: %timeit df.groupby(level = 0).max()
10 loops, best of 3: 24 ms per loop

edited Feb 21, 2018 at 11:38

answered Feb 21, 2018 at 11:19

jezrael

868k103 gold badges1.4k silver badges1.3k bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

jezrael Over a year ago

@Arpit Solanki - Thank you for timings.

Arpit Solanki Over a year ago

welcome. Deleted mine because does not make sense to give a slower solution

user8970640 Over a year ago

Thanks for the alternative and the timings @jezrael. Size of data is about (20000, 200). I can see the timings between the two approaches are similar, as you mentioned.

jezrael Over a year ago

@unfolx - You are welcome. Yes, exactly. I think df.max(level = 0) is less typing mainly, maybe df.groupby(level = 0).max() is more readable? Hard to say what is the best ;)

Collectives™ on Stack Overflow

Adding rows of one DataFrame with same index

1 Answer 1

4 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

4 Comments

Your Answer

Sign up or log in

Post as a guest

Related