2

I have an iterator iterator and a list of indices indices (repeats possible) and I want to extract just those elements from my iterator. At the moment I'm doing

indices = sorted(indices)
deltas = [indices[0]] + [indices[i+1] - indices[i] for i in range(len(indices) - 1)]
output = []
for delta in deltas:
    for i in range(delta):
        datum = next(iterator)
    output.append(datum)

Are those two layers of loop necessary? Am I missing a trick with itertools?

3
  • 4
    Yeah, you need to give some sort example that helps us reproduce what you are trying to accomplish. I presume indices is some list of ints, but this line: indices[0] + [indices[i+1] - indices[i] for i in range(len(indices) - 1)] will not work if that is the case - you can't add a list to an int. Commented Sep 24, 2016 at 0:21
  • 1
    Also, indices implies some sort of sequence type. iterators are for a single pass over a sequence: if you are going to be repeating indices why not index directly into the sequence? Commented Sep 24, 2016 at 0:23
  • I can't think of anything that would be a clear improvement over your code. itertools functions like islice won't work for you since you may have repeated indices. One thing to consider though: if the iterator contains only a modest amount of data (not more than can fit in memory, and not orders of magnitude more than the number of indices), it might be easier to simply consume it all to populate a list, and use a trivial list comprehension to get the requested values. Commented Sep 24, 2016 at 0:23

2 Answers 2

2

You definitely don't need the double loop as you can do this with a single loop and without creating deltas but the check code becomes more complicated:

it = iter(sorted(indices))
index = next(it)
for i, datum in enumerate(iterator):
    if i != index:
        continue
    output.append(datum)
    try:
        index = next(it)
    except StopIteration:
        break

You can also do this in a list comprehension for very low number of indices as you incur overhead for the check (but avoid the sort):

[datum for i, datum in enumerate(x) if i in indices]

You can reduce the cost of the check by converting indices to a set. I would be interested to see performance of sort over a set construction (set lookup is O(1)):

indices = set(indices)
[datum for i, datum in enumerate(x) if i in indices]

First and third options are roughly equivalent in timing at just over 900 ms (slight edge to first) to choose 1000 random indices out of 10,000,000 items. The OP's code ran in about 1.2 s.

Sign up to request clarification or add additional context in comments.

Comments

0

If memory isn't a constraint, I would just find the max index and prepopulate an array of iterator values up to that max index. You're going to have to compute the intermediate values anyway, so you really don't gain anything by computing the deltas.

max_index = max(indices)
data = [v for v in itertools.islice(iterator, max_index + 1)]
values = [data[i] for i in indices]

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.