0

I have a big array with data. I want to sum columns with one or two conditions. The data is already stored as classes in a dictionary.

The data is quite extensive, but the important part looks like this;

[["Gothenburg", "2018-01-05", "jan", 1.5, 2.3, 107],
 ["Gothenburg", "2018-01-15", "jan", 1.3, 3.3, 96],
 ["Gothenburg", "2018-01-25", "jan", 1.7, 3.2, 45],
 ["Gothenburg", "2018-03-05", "mar", 1.5, 2.1, 96],
 ["Gothenburg", "2018-03-05", "mar", 1.9, 2.8, 102],
 ["Malmo", "2018-01-02", "jan", 1.6, 2.3, 104],
 ["Malmo", "2018-01-10", "jan", 1.0, 2.9, 112],
 ["Malmo", "2018-03-05", "mar", 0.7, 4.3, 151],
 ["Malmo", "2018-03-25", "mar", 1.0, 3.3, 98],
 ["Hallsberg", "2018-01-25", "jan", 2.5, 2.3, 87],
 ["Hallsberg", "2018-02-14", "feb", 2.2, 2.3, 168],
 ["Hallsberg", "2018-03-06", "mar", 3.7, 2.3, 142],
 ["Hallsberg", "2018-04-29", "apr", 2.7, 2.3, 100]]

Explanation of columns: 0 = city, 1 = date, 2 = month, 3 = meanvalue1, 4 = meanvalue2, 5 = meanvalue3

The array is about 8000 rows in total with maybe 300 different cities.

What i want to achieve is to sum columns 3, 4, 5 after value in column 0, 1, 2.

For example sum of column 3 with key "Malmo" = 1.6 + 1.0 + 0.7 + 1.0 = 4.3 sum of column 3 with key "Malmo" and "jan" = 1.6 + 1.0 = 2.6

These conditional sums could either be stored in a dictionary (or a better solution), or they can be displayed att screen.

I guess there is a clever way to do this quite easy, but i haven't figured it out. I have tried to use for-loops and if cases, but it's messy. Hope to get some good advices here!

4
  • Can you use Pandas? Commented Mar 22, 2019 at 13:57
  • You're going to have to have a way to select the matching rows based on the conditions (preferably with a generator), and then sum on that will be easy. It does sound tailor made for a database. Commented Mar 22, 2019 at 13:59
  • Yes Panda and numpy is available, im just new to this so don't really know how to use them. Commented Mar 22, 2019 at 14:08
  • Kenny, could you develop how to use the database? Sorry if this is basic, im a noob :) Commented Mar 22, 2019 at 14:09

2 Answers 2

1

I like using the pandas library for dataframe-type objects. A solution for your problem:

import pandas as pd 
df  = pd.DataFrame([["Gothenburg", "2018-01-05", "jan", 1.5, 2.3, 107],
 ["Gothenburg", "2018-01-15", "jan", 1.3, 3.3, 96],
 ["Gothenburg", "2018-01-25", "jan", 1.7, 3.2, 45],
 ["Gothenburg", "2018-03-05", "mar", 1.5, 2.1, 96],
 ["Gothenburg", "2018-03-05", "mar", 1.9, 2.8, 102],
 ["Malmo", "2018-01-02", "jan", 1.6, 2.3, 104],
 ["Malmo", "2018-01-10", "jan", 1.0, 2.9, 112],
 ["Malmo", "2018-03-05", "mar", 0.7, 4.3, 151],
 ["Malmo", "2018-03-25", "mar", 1.0, 3.3, 98],
 ["Hallsberg", "2018-01-25", "jan", 2.5, 2.3, 87],
 ["Hallsberg", "2018-02-14", "feb", 2.2, 2.3, 168],
 ["Hallsberg", "2018-03-06", "mar", 3.7, 2.3, 142],
 ["Hallsberg", "2018-04-29", "apr", 2.7, 2.3, 100]])

df.columns = ['City', 'Date', 'Month', 'Mean1', 'Mean2', 'Mean3']

Choose what to group by:

group_by = ['City', 'Month'] #group_by = ['Month']

Create a group_by Dataframe with the sums of the columns:

City_Mon_Sum = df.groupby(group_by).agg({'Mean1': 'sum', 'Mean2': 'sum', 'Mean3': 'sum'}).reset_index()
City_Mon_Sum.rename(columns = {'Mean1': 'Group_Mean1', 'Mean2': 'Group_Mean2', 'Mean3': 'Group_Mean3'}, inplace = True )

Merge the two dataframes:

df = pd.merge(df, City_Mon_Sum, on = group_by)

Output:

City    Date    Month   Mean1   Mean2   Mean3   Group_Mean1 Group_Mean2 Group_Mean3
0   Gothenburg  2018-01-05  jan 1.5 2.3 107           4.5   8.8          248
1   Gothenburg  2018-01-15  jan 1.3 3.3 96  4.5 8.8 248
2   Gothenburg  2018-01-25  jan 1.7 3.2 45  4.5 8.8 248
3   Gothenburg  2018-03-05  mar 1.5 2.1 96             3.4  4.9          198
4   Gothenburg  2018-03-05  mar 1.9 2.8 102 3.4 4.9 198
5   Malmo   2018-01-02  jan 1.6 2.3 104 2.6 5.2 216
6   Malmo   2018-01-10  jan 1.0 2.9 112 2.6 5.2 216
7   Malmo   2018-03-05  mar 0.7 4.3 151 1.7 7.6 249
8   Malmo   2018-03-25  mar 1.0 3.3 98  1.7 7.6 249
9   Hallsberg   2018-01-25  jan 2.5 2.3 87  2.5 2.3 87
10  Hallsberg   2018-02-14  feb 2.2 2.3 168 2.2 2.3 168
11  Hallsberg   2018-03-06  mar 3.7 2.3 142 3.7 2.3 142
12  Hallsberg   2018-04-29  apr 2.7 2.3 100 2.7 2.3 100
Sign up to request clarification or add additional context in comments.

4 Comments

Sorry, I am not quite sure how to display the table output of a pd Dataframe on stack overflow
A got a new problem. Is there a way to sum values groped by ”City”, but not all values, just the previous ones and then place them in a new column? Example, sum of previous ”Mean3” for ”Hallsberg” in an new column: New column ”Sum prev. Mean3" Row 9 = 0 (since there are no previous ”Hallsberg” values Row 10 = 0 + 87 = 87 Row 11 = 0 + 87 + 168 = 255 Row 12 = 0 + 87 + 168 + 142 = 397 It seems really tricky, but maybe there is a way?
There is a few ways this can be done. The question is, how do you want to decide where to make the "cutoff" for the previous values. i.e. Do you want it be row number? Example: sum all values of hallsberg upto row 10. Do you want it by occurance? Example: sum all values of hallsberg utpo (and including) the second occurrance of Hallsberg?
I managed to do it like this: df["Sum_mean"] = df.groupby('City')['Mean3'].transform(lambda x: x.cumsum().shift()).fillna(0)
0

The trick is to use a tuple as key for the dictionary. Assuming your data is stored in a variable named big_array_with_data, here is a solution using collections.defaultdict:

from collections import defaultdict

monthly = [defaultdict(int) for i in range(3)]
totals =  [defaultdict(int) for i in range(3)]

for place, _, month, *means in big_array_with_data:
    for i, mean in enumerate(means):
        monthly[i][(place, month)] += mean
        totals[i][place] += mean

print(monthly[0][('Malmo', 'jan')])
print(totals[0]['Malmo'])

You could also do it without defaultdict like this:

monthly[i][(place, month)] = monthly[i].get((place, month), 0) + mean

That being said, if you are planning to do data crunching like this on a regular basis, working through a pandas tutorial is time well invested.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.