1

Simple problem not sure what's wrong but: I am trying to iterate through two lists which were read from a csv file as follows:

for row1 in (list(csv_data1)):
  for row2 in (list(csv_data2)):
    # do something with row2 and row2

However after each iteration of the outer for loop, the inner for loop does not recognize that the outer for loop is iterated! For example if I do this:

for row1 in (list(csv_data1)):
  for row2 in (list(csv_data2)):
    # do something with row2 and row2
  print row1

The elements of row1 get printed properly. However if I try to print the element of the outermost loop within the inner loop like so:

for row1 in (list(csv_data1)):
   for row2 in (list(csv_data2)):
     # do something with row2 and row2
     print row1

I only get the first row of (list(csv_data1)) multiple times!

So if csv_data1 = [['a','b'],['b','c']] for example, I expect the above print statement (print in inner loop) to print:

[['a','b']
# repeated prints of above for however long csv_data2 is ...
['b','c']]
# repeated prints of above for however long csv_data2 is ...

But instead I get the following:

[['a','b']
# repeated prints of above for however long csv_data2 is ...
['a','b']]
# repeated prints of above for however long csv_data2 is ...

I.e. I can't get both loops to iterate through each other. I'm missing something very obvious, any help will be greatly appreciated. Thanks.

Edit: More specifically here is what I am trying to do: (I'm just printing right now to try to diagnose the problem)

f1 = open('file1.csv', 'rU')
f2 = open('file2.csv', 'rU')
reader1 = csv.DictReader(f1)
reader2 = csv.DictReader(f2)

# Grab desired columns from csv file
cols_desired = 'district,blockname,villagename'.split(',')

desired_cols_1 = (list(row[col]) for col in cols_desired) for row in reader1)
desired_cols_2 = (list(row[col]) for col in cols_desired) for row in reader2)

for row1 in (list(desired_cols_1)):
  for row2 in (list(desired_cols_2)):
    print row1
    # XXX this prints only the first row of list(desired_cols_1) repeated times for some reason!
4
  • Without seeing your code more specifically it's hard to diagnose this. Commented Oct 19, 2015 at 0:55
  • Are you doing anything else in your inner for loop, or just printing? Commented Oct 19, 2015 at 0:57
  • show us all the lines you reference csv_data1 inside the innermost loop, you must be modifying it, or the inner loop hasn't really ended. Commented Oct 19, 2015 at 1:01
  • Hi thanks for your responses. I've updated with more specific code. No I am not modifying anything I believe, just trying to print each element in row1 within the inner loop, but getting the undesired behaviour as in original post. Commented Oct 19, 2015 at 1:11

3 Answers 3

1

The problem is that you are using a generator for you inner loop. Once you iterate over a generator once, the generator is empty. So in your first loop, you consume all the elements of csv_data2, and then it is empty for all the following loops.

Look at this:

>>> x = (i for i in range(5))
>>> y = (i for i in range(5))
>>> for i in x:
...     ylist = list(y)
...     print(id(ylist))
...     print(len(ylist))
...
44917584
5
44917624
0
44918104
0
44918144
0
44918184
0
>>> print(len(list(x)))
0

Each iteration creates a new list, and in all but the first iteration, ylist is empty. That's because the first iteration consumes the generator's elements when it creates the list. There's a similar effect on x: it's empty after the for loop as well. That's what you're seeing.

The solution is to create the lists prior to the loops:

# Square brackets make this a list comprehension instead of a raw generator
# List comprehension gives back a list
desired_cols_1 = [list(row[col]) for col in cols_desired) for row in reader1]
desired_cols_2 = [list(row[col]) for col in cols_desired) for row in reader2]

for row1 in desired_cols_1:
  for row2 in desired_cols_2:
    print row1, row2

This will consume the generators only once.

Alternatively, if the data is so large you can't load it all into memory, you could create a new generator for each iteration instead of creating the internal generator prior to the loop:

desired_cols_1 = (list(row[col]) for col in cols_desired) for row in reader1)

for row1 in desired_cols_1:
  # Need to make sure the reader is back at the beginning
  reader2.seek(0)
  desired_cols_2 = (list(row[col]) for col in cols_desired) for row in reader2)
  for row2 in desired_cols_2:
    print row1, row2
Sign up to request clarification or add additional context in comments.

Comments

1

One thing to note with for loops in any programming language is you iterate 10 times you are simple saying execute the same statements/functions in the for loop until the loop ends

for i in ['a','b','c','d']:
    for j in ["hello"]:
        print(j)

output

hello
hello
hello
hello

Hence you can prevent repitition by placing your print statement before the second for loop starts

for row1 in (list(desired_cols_1)):
    print row1   
    for row2 in (list(desired_cols_2)):

Comments

1

I think you need to put the generator in the list call for desired_calls_1 and _2.

desired_cols_1 = [ [row[col] for col in cols_desired] for row in reader1 ]
desired_cols_2 = [ [row[col] for col in cols_desired] for row in reader2 ]

for row1 in desired_cols_1:
    for row2 in desired_cols_2:
        print row1

My file_1.csv:

district,blockname,villagename
a,b,c
e,f,g

My file_2.csv:

district,blockname,villagename
1,1,1
2,2,2
3,3,3

The output:

['a', 'b', 'c']
['a', 'b', 'c']
['a', 'b', 'c']
['e', 'f', 'g']
['e', 'f', 'g']
['e', 'f', 'g']

Of course, it will print row1 x number of times, where x is len(desired_cols_2). Is that not what you are attempting with your nested for loop?

3 Comments

Thanks for your answer. Yes that is what I am trying to do at the moment. Your code manages to iterate through desired_cols1 instead of just the first row which is great, however it gets stuck in this infinite loop, when it should stop printing when row1 = last row of desired_cols_1. Do you know what could be wrong?
Could it be that there are a LOT of rows in desired_cols_2 and the output to the console just hasn't caught up yet? What is the len(desired_cols_2)? Can you try it on the test data I added to see if it works for you?
Yes your test data works :) So it's my csv file that is the problem. The length of desired_cols_2 is 3, the same as desired_cols_1... But there's a lot of data in the csv, thanks for your help.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.