0

Suppose I want to create an iterator class that takes another iterator as an input and counts the frequency of the elements. I cannot use lists, dicts, or any other data structure that could be used to store information for multiple elements together so I have to solve this by creating some sort of a nested iteration. Conceptually, I want my class to do the following:

for i in iter(input):
        count=0
        for j in iter(input):
            if i=j:
               count+=1
            if j = None: #iterator is done
               # reset j
               # move to next element of i

This is obviously a simplified example in many ways but I hope the general intended structure of my class is clear. Each time I save the value of i and the count to disk but we can ignore that for now.

The first problem I come across is that Python does not allow iterators to be reset once they are consumed which creates an issue with resetting the inner loop (j). I overcome this using itertools.cycle() when initiating the second iterator which allows endless iteration. Unfortunately, my code below only does one succesfull pass over the data and the first if statement does not return the next value of the outer iterator but instead treats it as if it has been already consumed.


class Match:

    def __init__(self, input):
        '''
        Input is an iterator
        '''
        self.in1 = input 
        self.in2 = input
        self.c=0 #the count

    def __iter__(self):
        '''
        Initializes the iterators and fetches the first element
        '''

        self.it1 = iter(self.in1) # initialize the first (outer) iterator
        self.it2 = itertools.cycle(iter(self.in2)) # initialize the second (inner) iterator

        self.i = next(self.it1) #pin the first elements
        self.j = next(self.it2)

        return self

    def _inc_outter_end(self): 
        '''increment or end outer iterator'''
        try:
            self.i = next(self.it1)
        except StopIteration:
            self.i = None
            self.j = None


    def __next__(self):

        i = self.i
        j = self.j
        self.j = next(self.it2) 
        self.c+=1

        if self.c ==9:
            self.c=0
            self._inc_outter_end()      
            i = self.i 

        #stop if done
        elif i == None:
            raise StopIteration()

        #skip non-pairs
        elif i != j:
            return self.__next__()
        #count and return matches
        elif i==j:
            return self.c

Running something like:

i1 = [1,7,2,4,6,6,1,1,3]
for match in Match(iter(i1)):
    print(match)

does one pass over the data such that i is always 1 but instead of doing 8 more passes (for all the next elements of the input) stops. Instead I would like it to return the same output as:

i1 = [1,7,2,4,6,6,1,1,3]
for i in i1:
    count=0
    for j in i1:
        if i==j:
            count+=1
    print(i,count)

giving

1 3
7 1
2 1
4 1
6 2
6 2
1 3
1 3
3 1
10
  • 2
    Counting element frequency doesn't seem like a job for an iterator. You need to completely consume the input to do that, at which point you might as well build and return a mapping of counts. Commented Jun 6, 2020 at 4:49
  • 1
    It’s also not clear what you mean by “ solve this recursively - namely by creating a nested iteration”. Generally iteration and recursion are not the same thing. Commented Jun 6, 2020 at 4:50
  • @MarkMeyer, thank you for pointing this out! What I mean exactly is that I am looking for a solution that keeps iterating through the input until all possible combinations have been exhausted. Recursion is indeed not the right word there, my apologies for the confusion. Commented Jun 6, 2020 at 4:57
  • 1
    What would you hope the last code example prints? Commented Jun 6, 2020 at 4:59
  • 2
    We're going to need to see the exact text of your assignment. There's likely some important context we're missing. Commented Jun 6, 2020 at 5:04

2 Answers 2

1

It seems like, for each element in the input iterator, you want to emit the number of times that element is yielded by the entire iterator. Obviously, you can't compute that number until you have fully exhausted the iterator. And that means that, whatever solution you come up with, it must involve somehow storing some information about the elements of the iterator in such a way that information about all the elements is stored at the same time.

However, you also say

I cannot use lists, dicts, or anything else [...]

Now it's not clear what you mean by that (specifically, what does "anything else" refer to exactly?), but one might naturally take this to mean anything you can use to store information about all elements of the iterator at the same time is off limits. If that is your situation, then this task is not possible. You'll have to relax one of your constraints or find a way around doing this altogether.

If what I've posted here is not the correct interpretation of "lists, dicts, or anything else" for your situation, then you'll have to clarify what you mean by that, and perhaps with more clarity a solution will present itself.


Someone may raise the objection that you can do this with itertools.tee(), which basically copies an iterator and lets you iterate over it twice (or a number of times of your choice). But the underlying implementation of tee() is effectively equivalent to storing the contents of the iterator in a list, which I assume is ruled out by your condition. (tee() can actually be more efficient than a list, in that it can store only part of the iterator, not the entire thing, if your usage pattern allows it. But that's not the case here; the task you're trying to undertake requires storing information about the entire iterator.)

Sign up to request clarification or add additional context in comments.

2 Comments

Thanks for your input David! You are completely correct in your interpretation of the structures I cannot use - anything that can be used to store information about multiple elements at the same time is off limits. I fully appreciate that this is a very weird situation and, while there is no single reason why I cannot use any of these structures, let's just say I am trying to getter a better understanding of iterators. You mention that the task is not possible - the nested loop implementation above provides the desired output, however. I am looking for a way to implement that with iterators.
In that case, I have to stand by my answer which is that there is no way to do this with iterators.
0

Here is a way you can count the elements in the list just using iterators. This prints one count for each element rather than repeating the counts for the repeated elements. If you really need to report the other way, maybe this will give you an idea.

The basic plan is to loop through the elements and count the target if they match. If they don't match, yield the elements back. The countFirst function is both a counter and a generator that provides the non-target values:

def countTarget(it, target):
    count = 1
    for i in it:
        if i == target:
            count += 1
        else:
            yield i
    print(f"number: {target} count: {count}")

def count(it):
    while True:      
        try:
            target = next(it)
        except StopIteration:
            return
        it = countTarget(it, target)


orig = iter([1,7,2,4,6,6,1,1,3])

count(orig)

Prints:

number: 1 count: 3
number: 7 count: 1
number: 2 count: 1
number: 4 count: 1
number: 6 count: 2
number: 3 count: 1

Of course this is not especially efficient — you loop over the iterator once for every unique value. But it seems like more of a through exercise than a practical exercise.

5 Comments

This answer hides the storage in the call stack - one stack frame is used to record each unique input value. It might fulfill the original assignment requirements (we still don't know the original requirements), but it doesn't avoid the need to materialize the input.
I'm not sure I agree @user2357112supportsMonica, but maybe running it through the bebugger will convince me. This is only creating lazily-evaluated iterators — so the data is never being copied. It's just being passed through a pipeline in a way similar to a coroutine. It's just chaining iterators.
The chain grows as deep as the number of distinct elements in the input, with each generator recording one element. For an input with n distinct elements, n generators are created, with n target variables holding the n distinct elements (and n copies of other per-generator overhead). The nesting also means that stack overflows occur for large numbers of distinct elements.
Okay, that's convincing. Thanks @user2357112supportsMonica. I'm going to leave this answer in case it's useful for the OP.
@MarkMeyer thank you for the answer, really appreciate it! It looks like that there is indeed no way to do that with iterators in Python as mentioned above but it is still helpful.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.