0

I can't understand why this is not working - my loop works nicely for the first iteration but then stops - print x[0] works every time but the next nested for loop only works the first time... any ideas?

    csv_reader=csv.reader(guuids, delimiter='\t')   
    matrix_reader=csv.reader(matrix, delimiter='\t')        

    for line in csv_reader:
            x = line
            print x[0]
            for mline in matrix_reader:
                    if x[0] in mline[0] or x[0] in mline[1]:
                            out.append(mline)
8
  • 3
    what is matrix_reader? Commented Nov 9, 2017 at 23:06
  • 1
    Did you try going back to the beginning of matrix_reader? Commented Nov 9, 2017 at 23:06
  • 1
    What is the content of matrix_reader? Commented Nov 9, 2017 at 23:07
  • 1
    if it is another file alike to csv_reader, you may need to call a seek to ensure that it goes back to the start of the file after every loop. Commented Nov 9, 2017 at 23:07
  • Why would matrix_reader ever stop iterating? Commented Nov 9, 2017 at 23:07

1 Answer 1

3

Many iterable objects in Python - things that you could put after in in a for loop - can only be iterated over once. After that, they're done; they can't go back to the beginning, and any further attempts to iterate over them will act as if they contain nothing. A csv.reader object is one example of this: in the first iteration of your outer loop, you iterate through all the available records that matrix_reader can provide. That's why, the next time the code comes around to that line, it looks as if matrix_reader is empty.

Perhaps the easiest way to solve this is to make a new matrix_reader each time you want to iterate over it. Like so:

for line in csv_reader:
    matrix_reader = ...
    for mline in matrix_reader:
        ...

To understand why csv.reader gets exhausted after you go through it once, you should know that a csv.reader does not represent a CSV file. Actually, despite the name, it's really more of a "converter": it takes lines of text from some source, which could be anything, and converts them into lists, one by one. After the reader has converted a line, it forgets about it. This allows the reader object to process millions of lines without taking a huge chunk of memory.

The tradeoff of this approach is that the reader object can't go back to lines it has processed before unless it can somehow tell its source of text to go back and repeat a previous line. But there's no guarantee that the underlying source can do that. If the source is the output from some other program, for example, you can't tell the program to go back and repeat an old line of output. Or if the source is text being streamed over the internet, you can't necessarily tell it to repeat a line that had been streamed before. So the reader can't count on being able to access old lines, and that's why, when it's gotten to the last one, the only reasonable behavior is for it to act as if it has nothing left.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.