0

I have a numpy 2D array as follows

gona = array([['a1', 3], ['a2', 5], ['a3', 1], ['a3', 2], ['a3', 1], ['a1', 7]])

This array has 2 columns

What I want to do is create an array with 2 columns. Column 1 should have 'a1' , 'a2', 'a3' values in its' rows and column 2 should have summation of those corresponding values.

new_gona = array([['a1', 10], ['a2', 5], ['a3', 4]])

Here, corresponding values are taken as follows.

'a1' : 3 + 7 = 10
'a2' : 5 
'a3' : 1 + 2 + 1 = 4

What would be an easy method to achieve this?

2
  • I remember seeing an efficient solution with Pandas the last time this problem came up, but I don't remember what it was. Maybe someone with better search skills can find it. Commented Feb 4, 2014 at 8:53
  • Note: Running the code you've posted produces an array of dtype '|S2'. This means that the integers are stored as strings, instead of as int32 or some other reasonable dtype. That may be a problem. Commented Feb 4, 2014 at 8:56

5 Answers 5

3

Use pandas and its indexing magic:

import pandas as pd
import numpy as np

gona = np.array([['a1', 3], ['a2', 5], ['a3', 1], 
              ['a3', 2], ['a3', 1], ['a1', 7]])

# Create series where second items are data and first items are index
series = pd.Series(gona[:,1],gona[:,0],dtype=np.float)

# Compute sums across index
sums = series.sum(level=0)

# Construct new array in the format you want
new_gona = np.array(zip(sums.index,sums.values))

new_gona
# out[]:
# array([['a1', '10.0'],
#        ['a2', '5.0'],
#        ['a3', '4.0']], 
#       dtype='|S4')

It's also notable that np.arrays can only hold one datatype. So your mixing of strings and numeric types needs to be corrected for by specifying dtype=np.float. You can use np.int if you want.

Sign up to request clarification or add additional context in comments.

Comments

2

A numpy only solution:

>>> labels, indices = np.unique(gona[:, 0], return_inverse=True)
>>> sums = np.bincount(indices, weights=gona[:, 1].astype(np.float))
>>> new_gona = np.column_stack((labels, sums))
>>> new_gona
array([['a1', '10'],
       ['a2', '5.'],
       ['a3', '4.']], 
      dtype='|S2')

Comments

1
from collections import defaultdict 
from operator import itemgetter

sums = defaultdict(int)
for key, value in gona:
    sums[key] += value

new_gona = sorted(sums.iteritems(), key=itemgetter(0))

Cheat?

Comments

0

Then, the list comprehension will do it pretty easy:

def fst(x): return x[0]
[(a, sum([int(m[1]) for m in gona if a == m[0]])) for a in set(map(fst, gona)) ]    

This is basic Python. No libraries involved. The first function is defined only avoid the lambda expression in the map at the end. Both the Pandas and the NumPy solutions already mentioned seem pretty interesting though. +1 for both!

Comments

-2

you have to write a loop around gona and store the (a1) as a key in dictionary object. The value should be added ofcourse

1 Comment

comments please, a solution was asked which was provided. No code but logical explanation

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.