How to avoid double loop in Python?

Question

From the data at https://jsonplaceholder.typicode.com/todos I wanted to count "completed" items by user.

Currently, I approach this by first collecting the existing user Id keys, then for each element in the dataset check if its owned by the current user and append to the list of items of that user.

users_items = {}

import json
from urllib import request

# Data from
uri = "https://jsonplaceholder.typicode.com/todos"

response = request.urlopen(uri).read()
data = json.loads(response)

def get_user_ids(items):
    for item in items:
        users_items[item['userId']] = None

def get_user_items():
    for uid in users_items:
        items = []
        for item in data:
            if(item['userId'] == uid):
                items.append(item['completed'])
        users_items[uid] = items

done_items_by_user = {}
def count_completed_by_user():
    for user in users_items:
        done_items_by_user[user] = sum(users_items[user])

get_user_ids(data)
get_user_items()

I especially don't like the double loop and the initialization of the dictionary values with an empty list in get_users_ids.

item['completed'] is a boolean value. Are you sure that you need to accumulate only boolean values? — RomanPerekhrest
– RomanPerekhrest, Commented Jun 5, 2019 at 11:33
the boolean = True means the item is done. I want to count done items — TMOTTM
– TMOTTM, Commented Jun 5, 2019 at 11:34

RomanPerekhrest · Accepted Answer · 2019-06-05 11:45:11Z

3

Simply with defaultdict object:

import json
from urllib import request
from collections import defaultdict

# Data from
uri = "https://jsonplaceholder.typicode.com/todos"

response = request.urlopen(uri).read()
data = json.loads(response)


def count_user_completed_items(data):
    result = defaultdict(int)
    for item in data:
        if item['completed']: result[item['userId']] += 1
    return dict(result)


print(count_user_completed_items(data))

The output (where key is "user ID" and value is a number of "Done" items):

{1: 11, 2: 8, 3: 7, 4: 6, 5: 12, 6: 6, 7: 9, 8: 11, 9: 8, 10: 12}

edited Jun 5, 2019 at 11:45

answered Jun 5, 2019 at 11:39

RomanPerekhrest

93.1k4 gold badges75 silver badges112 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

TMOTTM Over a year ago

I wonder what's the point for dict when I would need defaultdict anyway most of the time then..

Serge Ballesta Over a year ago

@TMOTTM: dict is a simple low level container. defaultdict is a much more complex class, hence a subclass of dict.

user10325516 · Accepted Answer · 2019-06-05 11:43:04Z

0

You may use dict method get() to insert/update user ids:

done_items_by_user = dict()
for item in data:
    done_items_by_user[item['userId']] = done_items_by_user.get(item['userId'], 0) + item['completed']

answered Jun 5, 2019 at 11:43

user10325516

Comments

Itamar Mushkin · Accepted Answer · 2019-06-05 12:00:52Z

0

The popular pandas library allows you do to this in one line:

import pandas as pd
complete_items_per_user = pd.DataFrame(data).groupby('userId')['completed'].sum()

If you're asking what you can do without pandas, you can avoid the explicit loop with a dict comprehension:

users = set(x['userId'] for x in data)
complete_items_per_user = {user: sum(x['completed'] for x in data if x['userId']==user) for user in users}

edited Jun 5, 2019 at 12:00

answered Jun 5, 2019 at 11:44

Itamar Mushkin

2,9232 gold badges19 silver badges34 bronze badges

6 Comments

TMOTTM Over a year ago

From SQL perspective its exactly a group by operation. Question is what's the closest the Python Standard library gets to that.

Itamar Mushkin Over a year ago

With the standard Python library, you can probably do without the loop using dict comprehension. I'm adding to the answer

TMOTTM Over a year ago

Thanks but just wondering you're solving also with double loop... so the only improvement i see there is using set type, which makes a lot of sense no question.

Itamar Mushkin Over a year ago

Logically, there's not much difference between a list/dict/etc comprehension and a for loop (as you've pointed correctly - the word for appears the same number of times :-) ), but as far as I know, Python generally performs better with comprehensions, and it is arguably more readable and concise.

Itamar Mushkin Over a year ago

Though, to be honest, I'd just go with the pandas solution for performance and clarity...

|

Collectives™ on Stack Overflow

How to avoid double loop in Python?

3 Answers 3

2 Comments

Comments

6 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

2 Comments

Comments

6 Comments

Your Answer

Sign up or log in

Post as a guest

Related