3

I have list of dictionaries as below

dataset={"users": [
    {"id": 20, "loc": "Chicago", "st":"4", "sectors": [{"sname": "Retail"}, {"sname": "Manufacturing"}, {"sname": null}]}, 
    {"id": 21, "loc": "Frankfurt", "st":"4", "sectors": [{"sname": null}]}, 
    {"id": 22, "loc": "Berlin", "st":"6", "sectors": [{"sname": "Manufacturing"}, {"sname": "Banking"},{"sname": "Agri"}]}, 
    {"id": 23, "loc": "Chicago", "st":"2", "sectors": [{"sname": "Banking"}, {"sname": "Agri"}]},
    {"id": 24, "loc": "Bern", "st":"1", "sectors": [{"sname": "Retail"}, {"sname": "Agri"}]},
    {"id": 25, "loc": "Bern", "st":"4", "sectors": [{"sname": "Retail"}, {"sname": "Agri"}, {"sname": "Banking"}]}
    ]}

I tried below code to remove loc, sectors from above lists so that my list would contain only id and loc

fs_loc = []
for g, items in itertools.groupby(data['users'], lambda x: (x['id'],x['loc'])):
    fs_loc.append({ 'id': g[0], 'loc': g[1] })
print(fs_loc)

From this, how can I create new list such that it will have list of id's and the count of them that were grouped by locations like below.

{"locations": [
    {"loc": "Chicago","count":2,"ids": [{"id": "20"}, {"id": "23"}]}, 
    {"loc": "Bern","count":2,"ids": [{"id": "24"}, {"id": "25"}]}, 
    {"loc": "Frankfurt","count":1,"ids": [{"id": "21"}]}, 
    {"loc": "Berlin","count":1,"ids": [{"id": "21"}]}    
    ]}

I found this difficulty in making the list as above using itertools, probably I might be missing some better approach on achieving as above, could you please suggest.

1 Answer 1

4

You need to pass a sorted sequence to itertools.groupby.

According to itertools.groupby documentation:

... Generally, the iterable needs to already be sorted on the same key function.

The operation of groupby() is similar to the uniq filter in Unix. It generates a break or new group every time the value of the key function changes (which is why it is usually necessary to have sorted the data using the same key function). That behavior differs from SQL’s GROUP BY which aggregates common elements regardless of their input order.

byloc = lambda x: x['loc']

it = (
    (loc, list(user_grp))
    for loc, user_grp in itertools.groupby(
        sorted(dataset['users'], key=byloc), key=byloc
    )
)
fs_loc = [
    {'loc': loc, 'ids': [x['id'] for x in grp], 'count': len(grp)}
    for loc, grp in it
]

fs_loc

[
    {'count': 1, 'loc': 'Berlin', 'ids': [22]},
    {'count': 2, 'loc': 'Bern', 'ids': [24, 25]},
    {'count': 2, 'loc': 'Chicago', 'ids': [20, 23]},
    {'count': 1, 'loc': 'Frankfurt', 'ids': [21]}
]
Sign up to request clarification or add additional context in comments.

9 Comments

Thanks a lot! I will try this now
Hi falsetru! I have a question, can we also add st values with the id's inside the grp ? I tried fs_loc = [ {'loc': loc, 'ids': [x['id'],x['st'] for x in grp], 'count': len(grp)} for loc, grp in it ] It is throwing error, could you please suggest?
fs_loc should appear to have 'ids':[{id,st},{id,st}] .. may be like this?
@SathishPanduga, Could you post a separate question with exact desired output? (The output should be valid python literals)
@SathishPanduga, I am going to sleep. So response will take long. But with a separate question, other people can see and help you to solve the problem.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.