python optimize list search

Question

So take the following example list

l = [
    {
        'post':1,
        'user':1,
        'other_stuff':'something',
        'more':'you get the point'
    },
    {
        'post':1,
        'user':2,
        'other_stuff':'something',
        'more':'you get the point'
    },
    {
        'post':2,
        'user':1,
        'other_stuff':'something',
        'more':'you get the point'
    },
]

I need to be able to check if a 'user' is already connected to a 'post', and I could do it with looping:

user = 1
post = 1
response = False
for connection in l:
    if connection['post'] == post and connection['user'] == user:
        response = True
        break

and that works very well. The issue is that in the actual situation, l will be populated 1.5 million times, and this iteration will run every time it populates since it needs to check to see if something already exists. so the last 500k iterations will be iterating through a list of over 1 million dictionaries. There is no way that this is the most efficient method for this!! My question is: what would be an optimal method that would not require such exhaust?

note: I do not necessarily know the values of the other keys in the dictionaries so I cannot do if x is in l to check

Could you put this into a database as these l values come in? Then you can run queries against the database without needing to iterate over all the rows yourself. — Frank Riccobono
– Frank Riccobono, Commented Jan 10, 2014 at 0:00
in reality, I will most likely use a database since python will always be slower in this regard, but in other projects I work on I may not be able to use a database and an optimal search_like method for comparing a list of dictionaries would be useful — Ryan Saxe
– Ryan Saxe, Commented Jan 10, 2014 at 0:04

Yuushi · Accepted Answer · 2014-01-10 00:03:40Z

4

I'd reconsider how you are laying out your data structure. If you need to have efficient access on a post and user pair, I'd consider storing it in a format like the following:

l = { (1, 1) : {'other stuff':'something', ...}, 
      (1, 2) : {'other stuff':'something', ...},
      (2, 1) : {'other stuff':'something', ...} }

Then this becomes an O(1) lookup:

user_post_pair = (1, 1)
if user_post_pair in l:
    # Stuff...

answered Jan 10, 2014 at 0:03

Yuushi

26.3k7 gold badges66 silver badges84 bronze badges

Sign up to request clarification or add additional context in comments.

8 Comments

Keeler Over a year ago

Using l for variable names. Gotta love it :)

Frank Riccobono Over a year ago

And if for some reason you can't control the input to begin with, you can convert it into this format as it comes in.

Ryan Saxe Over a year ago

I have considered that, the only issue I have with it is that this list is being accessed in a couple other places, and the way it is set up for those makes it much simpler...I may as well make a copy and store two separate objects, a list and dictionary.

Yuushi Over a year ago

@Keeler Yeah, I agree. I've kept it since the OP used it, and I'm assuming it's not actually used in the code. At least I hope not...

Ryan Saxe Over a year ago

@Yuushi don't worry it's not

|

Collectives™ on Stack Overflow

python optimize list search

1 Answer 1

8 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

8 Comments

Your Answer

Sign up or log in

Post as a guest

Related