0

I've tried Pandas and Numpy but haven't seen the result I want. I have a simple array that consists of several lines of this:

[[customer_number, customer_name, invoice balance],[customer_number, customer_name, invoice balance]]

and so on. Many have the same customer_number and every customer_number has it's own customer_name associated, so one will always equal the other. What I'd like to do is basically a group by function, similar to the in SQL. I want this:

[[customer_number, customer_name, sum(invoice_balance)]]

where the last sums all the invoice balances with the same customer_number, leaving me with an array consisting of all entirely unique customer_numbers and the sum of all invoice balances for that customer.

I'd prefer to do this without pandas or numpy, but will use it if need be. I've been trying to modify a version of this to work:

[sum(x[2]) for x in array]

but my invoice_balance is straight out of psycopg2 and is formatted as a Decimal object, and for some reason, that wasn't working.

Is there a way to do this without a library in Python, or is there some easy method in pandas/numpy?

edit: here is an example of the array I am working with that I get directly from psycopg2:

[[Decimal('1111'), 'Customer1', Decimal('31.50')], 
[Decimal('1112'), 'Customer2', Decimal('30.88')], 
[Decimal('1111'), 'Customer1', Decimal('90.00')], 
[Decimal('1113'), 'Customer3', Decimal('30.88')], 
[Decimal('1112'), 'Customer2', Decimal('30.88')], 
[Decimal('1112'), 'Customer2', Decimal('15.00')], 
[Decimal('1111'), 'Customer1', Decimal('37.93')], 
[Decimal('1113'), 'Customer3', Decimal('30.88')], 
[Decimal('1111'), 'Customer1', Decimal('30.88')], 
[Decimal('1111'), 'Customer1', Decimal('30.88')], 
[Decimal('1113'), 'Customer3', Decimal('26.60')], 
[Decimal('1113'), 'Customer3', Decimal('44.22')], 
[Decimal('1112'), 'Customer2', Decimal('32.93')], 
[Decimal('1111'), 'Customer1', Decimal('20.00')], 
[Decimal('1113'), 'Customer3', Decimal('38.14')], 
[Decimal('1111'), 'Customer1', Decimal('16.60')], 
[Decimal('1112'), 'Customer2', Decimal('67.46')], 
[Decimal('1111'), 'Customer1', Decimal('30.88')], 
[Decimal('1113'), 'Customer3', Decimal('30.88')], 
[Decimal('1111'), 'Customer1', Decimal('233.42')]]

and the error I receive when I try [sum(x[2]) for x in array]:

TypeError: 'decimal.Decimal' object is not iterable

edit 2:

[Decimal('1112'), Decimal('393217'), datetime.date(2021, 5, 5), Decimal('961.96'), Decimal('46.16'), Decimal('551.05'), Decimal('961.96')], 
[Decimal('1111'), Decimal('392865'), datetime.date(2021, 4, 29), Decimal('270.57'), Decimal('221.65'), Decimal('0.00'), Decimal('270.57')], 
[Decimal('1113'), Decimal('392716'), datetime.date(2021, 4, 27), Decimal('494.44'), Decimal('123.45'), Decimal('0.00'), Decimal('494.44')], 
[Decimal('1112'), Decimal('392654'), datetime.date(2021, 4, 26), Decimal('156.60'), Decimal('69.99'), Decimal('6.50'), Decimal('156.60')], 
[Decimal('1113'), Decimal('392654'), datetime.date(2021, 4, 26), Decimal('160.42'), Decimal('72.99'), Decimal('52.80'), Decimal('160.42')]]

per Mark's answer, I am curious how to adjust his code to make this sum each column that is not the customer id or name so I end up with something like this:

[[customer_id, customer_name, sum(total), sum(applied), sum(credit), sum(balance)]]
3
  • Can you show some data you can print ? I don't understand the psycopg2 and is formatted as a Decimal object part. Maybe print(type(all_data[0][2])) so we understand the type of balance Commented May 18, 2021 at 17:19
  • That is a list, not an array. Commented May 18, 2021 at 17:28
  • In any case, *what is it you want to do exactly? You want to group by customer id? [sum(x[2]) for x in array] isn't working becasause sum expects an iterable, you are passing it a Decimal, but it would fail in the exact same way if you meant a float. Commented May 18, 2021 at 17:30

3 Answers 3

2

You can make a dict that is keyed to the tuple of account name/number. Then loop through and collect the sums in the dict. Afterward you can convert the dict items() back a list:

accounts = {}

for num, name, balance in l:
    accounts[(num, name)] = accounts.get((num, name), 0) + balance
    
result = [[num, name, balance] for (num, name), balance in accounts.items()]

result will be:

[[Decimal('1111'), 'Customer1', Decimal('522.09')],
 [Decimal('1112'), 'Customer2', Decimal('177.15')],
 [Decimal('1113'), 'Customer3', Decimal('201.60')]]
Sign up to request clarification or add additional context in comments.

1 Comment

I have now found a need to add a few more columns to my list which need to be rolled up, too. How would I adjust this code to add these columns: num, name, total, applied, credit, balance I have also added a new sample of my data to the op.
1

Just to show you that you can do this with pandas also:

In [1]: import pandas as pd

In [2]: from decimal import Decimal

In [3]: data = [[Decimal('1111'), 'Customer1', Decimal('31.50')],
   ...: [Decimal('1112'), 'Customer2', Decimal('30.88')],
   ...: [Decimal('1111'), 'Customer1', Decimal('90.00')],
   ...: [Decimal('1113'), 'Customer3', Decimal('30.88')],
   ...: [Decimal('1112'), 'Customer2', Decimal('30.88')],
   ...: [Decimal('1112'), 'Customer2', Decimal('15.00')],
   ...: [Decimal('1111'), 'Customer1', Decimal('37.93')],
   ...: [Decimal('1113'), 'Customer3', Decimal('30.88')],
   ...: [Decimal('1111'), 'Customer1', Decimal('30.88')],
   ...: [Decimal('1111'), 'Customer1', Decimal('30.88')],
   ...: [Decimal('1113'), 'Customer3', Decimal('26.60')],
   ...: [Decimal('1113'), 'Customer3', Decimal('44.22')],
   ...: [Decimal('1112'), 'Customer2', Decimal('32.93')],
   ...: [Decimal('1111'), 'Customer1', Decimal('20.00')],
   ...: [Decimal('1113'), 'Customer3', Decimal('38.14')],
   ...: [Decimal('1111'), 'Customer1', Decimal('16.60')],
   ...: [Decimal('1112'), 'Customer2', Decimal('67.46')],
   ...: [Decimal('1111'), 'Customer1', Decimal('30.88')],
   ...: [Decimal('1113'), 'Customer3', Decimal('30.88')],
   ...: [Decimal('1111'), 'Customer1', Decimal('233.42')]]

In [4]: df = pd.DataFrame(data, columns=['customer_id', 'customer_name', 'invoice_balance'])

In [5]: df
Out[5]:
   customer_id customer_name invoice_balance
0         1111     Customer1           31.50
1         1112     Customer2           30.88
2         1111     Customer1           90.00
3         1113     Customer3           30.88
4         1112     Customer2           30.88
5         1112     Customer2           15.00
6         1111     Customer1           37.93
7         1113     Customer3           30.88
8         1111     Customer1           30.88
9         1111     Customer1           30.88
10        1113     Customer3           26.60
11        1113     Customer3           44.22
12        1112     Customer2           32.93
13        1111     Customer1           20.00
14        1113     Customer3           38.14
15        1111     Customer1           16.60
16        1112     Customer2           67.46
17        1111     Customer1           30.88
18        1113     Customer3           30.88
19        1111     Customer1          233.42

Now, you can use a sql-esque declarative approach with pandas:

In [6]: df.groupby(['customer_id', 'customer_name'])['invoice_balance'].sum()
Out[6]:
customer_id  customer_name
1111         Customer1        522.09
1112         Customer2        177.15
1113         Customer3        201.60
Name: invoice_balance, dtype: object

Of course, I probably wouldn't add pandas as a dependency to your project just for this. but it is possible.

Comments

0
# always use decimal type for money, not float
from decimal import Decimal

# input data
data = [
    [ 1, 'Bob',   Decimal('1.23') ],
    [ 2, 'Alice', Decimal('2.34') ],
    [ 1, 'Bob',   Decimal('3.45') ],
    [ 2, 'Alice', Decimal('4.56') ],
]

# sum balances into buckets by customer number
buckets = {}
for num, name, balance in data:
    buckets.setdefault(num, [num, name, Decimal('0.00')])[2] += balance

# print the result
for bucket in buckets.values():
    print(bucket)

Output:

[1, 'Bob', Decimal('4.68')]
[2, 'Alice', Decimal('6.90')]

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.