Index into a Python iterator

Question

I have an iterator iterator and a list of indices indices (repeats possible) and I want to extract just those elements from my iterator. At the moment I'm doing

indices = sorted(indices)
deltas = [indices[0]] + [indices[i+1] - indices[i] for i in range(len(indices) - 1)]
output = []
for delta in deltas:
    for i in range(delta):
        datum = next(iterator)
    output.append(datum)

Are those two layers of loop necessary? Am I missing a trick with itertools?

Yeah, you need to give some sort example that helps us reproduce what you are trying to accomplish. I presume indices is some list of ints, but this line: indices[0] + [indices[i+1] - indices[i] for i in range(len(indices) - 1)] will not work if that is the case - you can't add a list to an int. — juanpa.arrivillaga
– juanpa.arrivillaga, Commented Sep 24, 2016 at 0:21
Also, indices implies some sort of sequence type. iterators are for a single pass over a sequence: if you are going to be repeating indices why not index directly into the sequence? — juanpa.arrivillaga
– juanpa.arrivillaga, Commented Sep 24, 2016 at 0:23
I can't think of anything that would be a clear improvement over your code. itertools functions like islice won't work for you since you may have repeated indices. One thing to consider though: if the iterator contains only a modest amount of data (not more than can fit in memory, and not orders of magnitude more than the number of indices), it might be easier to simply consume it all to populate a list, and use a trivial list comprehension to get the requested values. — Blckknght
– Blckknght, Commented Sep 24, 2016 at 0:23

Georg Plaz · Accepted Answer · 2022-01-24 18:40:39Z

You definitely don't need the double loop as you can do this with a single loop and without creating deltas but the check code becomes more complicated:

it = iter(sorted(indices))
index = next(it)
for i, datum in enumerate(iterator):
    if i != index:
        continue
    output.append(datum)
    try:
        index = next(it)
    except StopIteration:
        break

You can also do this in a list comprehension for very low number of indices as you incur overhead for the check (but avoid the sort):

[datum for i, datum in enumerate(x) if i in indices]

You can reduce the cost of the check by converting indices to a set. I would be interested to see performance of sort over a set construction (set lookup is O(1)):

indices = set(indices)
[datum for i, datum in enumerate(x) if i in indices]

First and third options are roughly equivalent in timing at just over 900 ms (slight edge to first) to choose 1000 random indices out of 10,000,000 items. The OP's code ran in about 1.2 s.

Brendan Abel · Accepted Answer · 2016-09-24 00:31:05Z

0

If memory isn't a constraint, I would just find the max index and prepopulate an array of iterator values up to that max index. You're going to have to compute the intermediate values anyway, so you really don't gain anything by computing the deltas.

max_index = max(indices)
data = [v for v in itertools.islice(iterator, max_index + 1)]
values = [data[i] for i in indices]

answered Sep 24, 2016 at 0:31

Brendan Abel

38k16 gold badges101 silver badges126 bronze badges

Collectives™ on Stack Overflow

Index into a Python iterator

2 Answers 2

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related