Difficulty creating new list grouping by a key using itertools in python

Question

I have list of dictionaries as below

dataset={"users": [
    {"id": 20, "loc": "Chicago", "st":"4", "sectors": [{"sname": "Retail"}, {"sname": "Manufacturing"}, {"sname": null}]}, 
    {"id": 21, "loc": "Frankfurt", "st":"4", "sectors": [{"sname": null}]}, 
    {"id": 22, "loc": "Berlin", "st":"6", "sectors": [{"sname": "Manufacturing"}, {"sname": "Banking"},{"sname": "Agri"}]}, 
    {"id": 23, "loc": "Chicago", "st":"2", "sectors": [{"sname": "Banking"}, {"sname": "Agri"}]},
    {"id": 24, "loc": "Bern", "st":"1", "sectors": [{"sname": "Retail"}, {"sname": "Agri"}]},
    {"id": 25, "loc": "Bern", "st":"4", "sectors": [{"sname": "Retail"}, {"sname": "Agri"}, {"sname": "Banking"}]}
    ]}

I tried below code to remove loc, sectors from above lists so that my list would contain only id and loc

fs_loc = []
for g, items in itertools.groupby(data['users'], lambda x: (x['id'],x['loc'])):
    fs_loc.append({ 'id': g[0], 'loc': g[1] })
print(fs_loc)

From this, how can I create new list such that it will have list of id's and the count of them that were grouped by locations like below.

{"locations": [
    {"loc": "Chicago","count":2,"ids": [{"id": "20"}, {"id": "23"}]}, 
    {"loc": "Bern","count":2,"ids": [{"id": "24"}, {"id": "25"}]}, 
    {"loc": "Frankfurt","count":1,"ids": [{"id": "21"}]}, 
    {"loc": "Berlin","count":1,"ids": [{"id": "21"}]}    
    ]}

I found this difficulty in making the list as above using itertools, probably I might be missing some better approach on achieving as above, could you please suggest.

falsetru · Accepted Answer · 2015-12-14 14:58:12Z

4

You need to pass a sorted sequence to itertools.groupby.

According to itertools.groupby documentation:

... Generally, the iterable needs to already be sorted on the same key function.

The operation of groupby() is similar to the uniq filter in Unix. It generates a break or new group every time the value of the key function changes (which is why it is usually necessary to have sorted the data using the same key function). That behavior differs from SQL’s GROUP BY which aggregates common elements regardless of their input order.

byloc = lambda x: x['loc']

it = (
    (loc, list(user_grp))
    for loc, user_grp in itertools.groupby(
        sorted(dataset['users'], key=byloc), key=byloc
    )
)
fs_loc = [
    {'loc': loc, 'ids': [x['id'] for x in grp], 'count': len(grp)}
    for loc, grp in it
]

fs_loc →

[
    {'count': 1, 'loc': 'Berlin', 'ids': [22]},
    {'count': 2, 'loc': 'Bern', 'ids': [24, 25]},
    {'count': 2, 'loc': 'Chicago', 'ids': [20, 23]},
    {'count': 1, 'loc': 'Frankfurt', 'ids': [21]}
]

edited Dec 14, 2015 at 14:58

answered Dec 14, 2015 at 14:54

falsetru

371k69 gold badges769 silver badges659 bronze badges

Sign up to request clarification or add additional context in comments.

9 Comments

Satheesh Panduga Over a year ago

Thanks a lot! I will try this now

Satheesh Panduga Over a year ago

Hi falsetru! I have a question, can we also add st values with the id's inside the grp ? I tried fs_loc = [ {'loc': loc, 'ids': [x['id'],x['st'] for x in grp], 'count': len(grp)} for loc, grp in it ] It is throwing error, could you please suggest?

Satheesh Panduga Over a year ago

fs_loc should appear to have 'ids':[{id,st},{id,st}] .. may be like this?

falsetru Over a year ago

@SathishPanduga, Could you post a separate question with exact desired output? (The output should be valid python literals)

falsetru Over a year ago

@SathishPanduga, I am going to sleep. So response will take long. But with a separate question, other people can see and help you to solve the problem.

|

Collectives™ on Stack Overflow

Difficulty creating new list grouping by a key using itertools in python

1 Answer 1

9 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

9 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related