1

From the data at https://jsonplaceholder.typicode.com/todos I wanted to count "completed" items by user.

Currently, I approach this by first collecting the existing user Id keys, then for each element in the dataset check if its owned by the current user and append to the list of items of that user.

users_items = {}

import json
from urllib import request

# Data from
uri = "https://jsonplaceholder.typicode.com/todos"

response = request.urlopen(uri).read()
data = json.loads(response)

def get_user_ids(items):
    for item in items:
        users_items[item['userId']] = None

def get_user_items():
    for uid in users_items:
        items = []
        for item in data:
            if(item['userId'] == uid):
                items.append(item['completed'])
        users_items[uid] = items

done_items_by_user = {}
def count_completed_by_user():
    for user in users_items:
        done_items_by_user[user] = sum(users_items[user])

get_user_ids(data)
get_user_items()

I especially don't like the double loop and the initialization of the dictionary values with an empty list in get_users_ids.

2
  • item['completed'] is a boolean value. Are you sure that you need to accumulate only boolean values? Commented Jun 5, 2019 at 11:33
  • the boolean = True means the item is done. I want to count done items Commented Jun 5, 2019 at 11:34

3 Answers 3

3

Simply with defaultdict object:

import json
from urllib import request
from collections import defaultdict

# Data from
uri = "https://jsonplaceholder.typicode.com/todos"

response = request.urlopen(uri).read()
data = json.loads(response)


def count_user_completed_items(data):
    result = defaultdict(int)
    for item in data:
        if item['completed']: result[item['userId']] += 1
    return dict(result)


print(count_user_completed_items(data))

The output (where key is "user ID" and value is a number of "Done" items):

{1: 11, 2: 8, 3: 7, 4: 6, 5: 12, 6: 6, 7: 9, 8: 11, 9: 8, 10: 12}
Sign up to request clarification or add additional context in comments.

2 Comments

I wonder what's the point for dict when I would need defaultdict anyway most of the time then..
@TMOTTM: dict is a simple low level container. defaultdict is a much more complex class, hence a subclass of dict.
0

You may use dict method get() to insert/update user ids:

done_items_by_user = dict()
for item in data:
    done_items_by_user[item['userId']] = done_items_by_user.get(item['userId'], 0) + item['completed']

Comments

0

The popular pandas library allows you do to this in one line:

import pandas as pd
complete_items_per_user = pd.DataFrame(data).groupby('userId')['completed'].sum()

If you're asking what you can do without pandas, you can avoid the explicit loop with a dict comprehension:

users = set(x['userId'] for x in data)
complete_items_per_user = {user: sum(x['completed'] for x in data if x['userId']==user) for user in users}

6 Comments

From SQL perspective its exactly a group by operation. Question is what's the closest the Python Standard library gets to that.
With the standard Python library, you can probably do without the loop using dict comprehension. I'm adding to the answer
Thanks but just wondering you're solving also with double loop... so the only improvement i see there is using set type, which makes a lot of sense no question.
Logically, there's not much difference between a list/dict/etc comprehension and a for loop (as you've pointed correctly - the word for appears the same number of times :-) ), but as far as I know, Python generally performs better with comprehensions, and it is arguably more readable and concise.
Though, to be honest, I'd just go with the pandas solution for performance and clarity...
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.