Improving efficiency in iterating over list sets in python

Question

I have a list of lists that has items in element 0 and value associated with it in element 1. Each item could appear more than once. I would like to create a list of uniques items with the max value associated with each one. My code accomplishes this, but seems very inefficient. Also, this is a simplified example. mylist could be 100,000 rows. Any suggestions of improving efficiency?

mylist = [['Item 1', 12],['Item 1', 10], ['Item 3', 12],['Item 4', 10], ['Item 3', 14]]

# get unique items
my_unique_items = list(set(x[0] for x in mylist))

# make it a list of list
my_unique_items = [[x] for x in my_unique_items]

# iterate over list items
for item in my_unique_items:

    # do list comp to get max value and append
    item.append(max([x[1] for x in mylist if x[0] == item[0]]))

print my_unique_items

Community · Accepted Answer · 2017-05-23 10:28:05Z

1

It would be more efficient to only loop through mylist once. If you only care about the max value for each item key, just keep a mapping of items and their max values and compare them as you go through the list.

This has a worst case of O(n), whereas your original had a worst case of O(n^2).

item_maxes = {}
for item in mylist:
    max_value = item_maxes.setdefault(item[0], None)
    if max_value is None or item[1] > max_value:
        item_maxes[item[0]] = item[1]

Edit: I think ShadowRanger's version of this method is much cleaner looking:

max_vals = {}
for item, value in mylist:
    max_vals[item] = max(max_vals.get(item, value), value)

edited May 23, 2017 at 10:28

CommunityBot

11 silver badge

answered Jan 27, 2016 at 19:32

Brendan Abel

38k16 gold badges101 silver badges126 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

jimjkelly Over a year ago

Note that the last line should assign item[1] to the item_maxes dict, not item[0[

user2242044 Over a year ago

I tested all the loops over 1,000,000 rows. Mine took 18 seconds, yours was the fastest at .4 seconds so I chose it as best answer. @shadowranger was .6 seconds

ShadowRanger Over a year ago

@user2242044: Not wholly surprising; the costs to use .get and max every run of the loop are a lot higher than you might expect. Amusingly, stupid hacks like caching bound methods and built-in functions to more local scope will end up changing the performance quite a bit, e.g. adding: max_vals_get = max_vals.get and _max = max just outside the loop, then doing: max_vals[item] = _max(max_vals_get(item, value), value) would likely reduce runtime for large inputs.

ShadowRanger · Accepted Answer · 2016-01-27 19:40:09Z

If the inputs are already sorted (or you want the outputs sorted), and nice way to do this is with itertools.groupby:

from future_builtins import map  # On Python 2.x only, to get generator based map

from itertools import groupby
from operator import itemgetter

# Nicer names, and avoid recreating getvalue on each loop
getitem, getvalue = itemgetter(0), itemgetter(1)

# If not already sorted, must sort by same key we're grouping on:
mylist.sort(key=getitem)

max_vals = [(k, max(map(getvalue, g))) for k, g in groupby(mylist, key=getitem)]

If you don't care about order, and your items are hashable, a dict is generally going to be faster (it might use slightly more memory if most items are unique):

max_vals = {}
for item, value in mylist:
    max_vals[item] = max(max_vals.get(item, value), value)

Sede · Accepted Answer · 2016-01-27 20:01:34Z

0

Using groupby from the itertools module and itemgetter from the the operator module.

>>> from itertools import groupby
>>> from operator import itemgetter
>>> d = {}
>>> for g, data in groupby(sorted(mylist, key=itemgetter(0)), key=itemgetter(0)):
...     d[g] = max(list(zip(*data))[1])
... 
>>> d
{'Item 1': 12, 'Item 3': 14, 'Item 4': 10}

You can also using the itertools.islice instead of using the list constructor and normal slice operation.

>>> for g, data in groupby(sorted(mylist, key=itemgetter(0)), key=itemgetter(0)):
...     d[g] = max(*islice(zip(*data),  1, None))
... 
>>> d
{'Item 1': 12, 'Item 3': 14, 'Item 4': 10}

edited Jan 27, 2016 at 20:01

answered Jan 27, 2016 at 19:45

Sede

61.5k20 gold badges158 silver badges162 bronze badges

Collectives™ on Stack Overflow

Improving efficiency in iterating over list sets in python

3 Answers 3

3 Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

3 Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related