1

For a given numpy array:

[[1, 1, 'IGNORE_THIS_COL', 100],
 [1, 1, 'IGNORE_THIS_COL', 101],
 [1, 2, 'IGNORE_THIS_COL', 100]]

Is it possible to sum the rows (and columns conditionally)? Say column 0 is group and column 1 is user, then I would like to add the fourth column accordingly. The final 'summed' array should look like this.

[[1, 1, 'IGNORE_THIS_COL', 201],
 [1, 2, 'IGNORE_THIS_COL', 100]]

I have already checked multiple answers, including Numpy: conditional sum.

3
  • Is ignore this column an integer? Or is it a string? Commented Jun 30, 2018 at 19:43
  • @user3483203 In this case it is an integer. Does that change the solution? Commented Jun 30, 2018 at 19:45
  • 1
    Very much so, otherwise numpy would cast all to strings when you created the array Commented Jun 30, 2018 at 19:45

1 Answer 1

1

You're looking for a groupby on a subset of columns. This is a challenge to implement with numpy, but is straightforward with a pandas groupby:

import pandas as pd

df = pd.DataFrame(array)
out = df.groupby([0, 1], as_index=False).agg({2:'first', 3:'sum'}).values.tolist()

print(out)
[[1, 1, 'IGNORE_THIS_COL', 201], [1, 2, 'IGNORE_THIS_COL', 100]]
Sign up to request clarification or add additional context in comments.

2 Comments

What is the purpose of 2:'first' in the aggregation?
@DaveIdito You wanted to ignore the column, so I'm ignoring it by just taking the first value from it per group.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.