Python - Iterating through a large list and putting in queue

Question

I have the for loop code:

people = queue.Queue()
for person in set(list_):
    first_name,last_name = re.split(',| | ',person)
    people.put([first_name,last_name])

The list being iterated has 1,000,000+ items, it works, but takes a couple seconds to complete.

What changes can I make to help the processing speed?

Edit: I should add that this is Gevent's queue library

A little thing you can do is put the re outside the loop. E.g. splitter = re.compile(r',| | '), then use lastname,firstname = splitter.split(person) instead of re.split — Francis Avila
– Francis Avila, Commented Dec 5, 2011 at 4:07

David K. Hess · Accepted Answer · 2011-12-05 04:00:01Z

1

The question is what is your queue being used for? If it isn't really necessary for threading purposes (or you can work around the threaded access) in this kind of situation, you want to switch to generators - you can think of them as the Python version of Unix shell pipes. So, your loop would look like:

def generate_people(list_):
    previous_row = None
    for person in sorted(list_):
        if person == previous_row:
            continue
        first_name,last_name = re.split(',| | ',person)
        yield [first_name,last_name]
        previous_row = person

and you would use this generator like this:

for first_name, last_name in generate_people():
    print first_name, last_name

This approach avoids what is probably your biggest performance hits - allocating memory to build a queue and a set with 1,000,000+ items on it. This approach works with one pair of strings at a time.

UPDATE

Based on more information about how threads play a roll in this, I'd use this solution instead.

people = queue.Queue()
previous_row = None
for person in sorted(list_):
    if person == previous_row:
        continue
    first_name,last_name = re.split(',| | ',person)
    people.put([first_name,last_name])
    previous_row = person

This replaces the set() operation with something that should be more efficient.

edited Dec 5, 2011 at 4:00

answered Dec 5, 2011 at 3:37

David K. Hess

17.3k2 gold badges53 silver badges77 bronze badges

Sign up to request clarification or add additional context in comments.

10 Comments

mikeyy Over a year ago

I'm using the queue for threading since it's thread safe. I'm not sure your way of splitting will work since the regex code I have in place is used to split between multiple delimiters. I will give the approach as a generator and see if that helps. Thanks.

David K. Hess Over a year ago

I didn't change anything about the split. Just reworked the function into a generator.

mikeyy Over a year ago

Oh sorry, I must have read another comment with the split being changed.. weird, my apologies.

David K. Hess Over a year ago

Is a thread pulling from this queue as you are adding to it? If so, then the set operation may be the real performance hit here.

mikeyy Over a year ago

Nope, I add everything to the queue once then run through it.

|

Matt Joiner · Accepted Answer · 2011-12-05 03:40:35Z

1

with people.mutex:
    people.queue.extend(list(re.split(',| | ',person)) for person in set(list_))
    people.not_empty.notify_all()

Note that this completely ignores the queue capacity, but avoids lots of excessive locking.

answered Dec 5, 2011 at 3:40

Matt Joiner

120k117 gold badges391 silver badges545 bronze badges

1 Comment

Matt Joiner Over a year ago

@mikeyy: No it won't. bitbucket.org/denis/gevent/src/aa97a252cf67/gevent/queue.py

ttyunix · Accepted Answer · 2011-12-05 03:45:58Z

0

I think you can use multi-threading reading data,and the queue concurrent queue.

answered Dec 5, 2011 at 3:45

ttyunix

11 bronze badge

Comments

Blender · Accepted Answer · 2011-12-05 04:18:27Z

0

I would try replacing regex with something a bit less intense:

first_name, last_name = person.split(', ')

edited Dec 5, 2011 at 4:18

answered Dec 5, 2011 at 3:34

Blender

300k55 gold badges462 silver badges511 bronze badges

Collectives™ on Stack Overflow

Python - Iterating through a large list and putting in queue

4 Answers 4

10 Comments

1 Comment

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

10 Comments

1 Comment

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related