I've tried Pandas and Numpy but haven't seen the result I want. I have a simple array that consists of several lines of this:
[[customer_number, customer_name, invoice balance],[customer_number, customer_name, invoice balance]]
and so on. Many have the same customer_number and every customer_number has it's own customer_name associated, so one will always equal the other. What I'd like to do is basically a group by function, similar to the in SQL. I want this:
[[customer_number, customer_name, sum(invoice_balance)]]
where the last sums all the invoice balances with the same customer_number, leaving me with an array consisting of all entirely unique customer_numbers and the sum of all invoice balances for that customer.
I'd prefer to do this without pandas or numpy, but will use it if need be. I've been trying to modify a version of this to work:
[sum(x[2]) for x in array]
but my invoice_balance is straight out of psycopg2 and is formatted as a Decimal object, and for some reason, that wasn't working.
Is there a way to do this without a library in Python, or is there some easy method in pandas/numpy?
edit: here is an example of the array I am working with that I get directly from psycopg2:
[[Decimal('1111'), 'Customer1', Decimal('31.50')],
[Decimal('1112'), 'Customer2', Decimal('30.88')],
[Decimal('1111'), 'Customer1', Decimal('90.00')],
[Decimal('1113'), 'Customer3', Decimal('30.88')],
[Decimal('1112'), 'Customer2', Decimal('30.88')],
[Decimal('1112'), 'Customer2', Decimal('15.00')],
[Decimal('1111'), 'Customer1', Decimal('37.93')],
[Decimal('1113'), 'Customer3', Decimal('30.88')],
[Decimal('1111'), 'Customer1', Decimal('30.88')],
[Decimal('1111'), 'Customer1', Decimal('30.88')],
[Decimal('1113'), 'Customer3', Decimal('26.60')],
[Decimal('1113'), 'Customer3', Decimal('44.22')],
[Decimal('1112'), 'Customer2', Decimal('32.93')],
[Decimal('1111'), 'Customer1', Decimal('20.00')],
[Decimal('1113'), 'Customer3', Decimal('38.14')],
[Decimal('1111'), 'Customer1', Decimal('16.60')],
[Decimal('1112'), 'Customer2', Decimal('67.46')],
[Decimal('1111'), 'Customer1', Decimal('30.88')],
[Decimal('1113'), 'Customer3', Decimal('30.88')],
[Decimal('1111'), 'Customer1', Decimal('233.42')]]
and the error I receive when I try [sum(x[2]) for x in array]:
TypeError: 'decimal.Decimal' object is not iterable
edit 2:
[Decimal('1112'), Decimal('393217'), datetime.date(2021, 5, 5), Decimal('961.96'), Decimal('46.16'), Decimal('551.05'), Decimal('961.96')],
[Decimal('1111'), Decimal('392865'), datetime.date(2021, 4, 29), Decimal('270.57'), Decimal('221.65'), Decimal('0.00'), Decimal('270.57')],
[Decimal('1113'), Decimal('392716'), datetime.date(2021, 4, 27), Decimal('494.44'), Decimal('123.45'), Decimal('0.00'), Decimal('494.44')],
[Decimal('1112'), Decimal('392654'), datetime.date(2021, 4, 26), Decimal('156.60'), Decimal('69.99'), Decimal('6.50'), Decimal('156.60')],
[Decimal('1113'), Decimal('392654'), datetime.date(2021, 4, 26), Decimal('160.42'), Decimal('72.99'), Decimal('52.80'), Decimal('160.42')]]
per Mark's answer, I am curious how to adjust his code to make this sum each column that is not the customer id or name so I end up with something like this:
[[customer_id, customer_name, sum(total), sum(applied), sum(credit), sum(balance)]]
psycopg2 and is formatted as a Decimal objectpart. Maybeprint(type(all_data[0][2]))so we understand the type of balance[sum(x[2]) for x in array]isn't working becasausesumexpects an iterable, you are passing it aDecimal, but it would fail in the exact same way if you meant a float.