1

I have two example csv files, csvexample.csv looks like this:

ID Text  
1  'good morning'  
2  'good afternoon'  
3  'good evening'  

While csvexample1.csv looks like this:

Day Month  
14  'Feb'  
21  'Mar'  
31  'May' 

With the following code, I get the result that I want, which is to add the first column of csvexample.csv and the second column of csvexample1.csv to one list; res.

import csv

res = []
with open('csvexample.csv') as f, open('csvexample1.csv') as a:
    reader=csv.reader(f) 
    reader1=csv.reader(a)
    next(reader)
    next(reader1)
    for row in zip(reader, reader1):
        res.extend([row[0][0], row[1][1]])  

print(res)   

I get the following outcome:

['1', 'Feb', '2', 'Mar', '3', 'May']  

However, the actual csv files I want to apply this code to contain some empty cells, seeing as I am adding the Twitter bio from companies from one file and the Tweets of those companies from another file into one list, but some companies do not have a bio on Twitter so those cells in a specific column are empty. Furthermore, in most cases the first file has much less rows than the second file, but the outcome then seems to stop when the first file has no rows left and ignores all the other rows in the second file. For example, if I edit csvexample.csv like this:

ID Text   
1  'good morning'  
2  'good afternoon'   

3  'good evening'  
4  

and csvexmple1.csv like this:

Day Month  
14  'Feb'  
21     
31  'May'  

I get the following outcome:

['1', 'feb', '2', '', '', 'may']  

instead of the desired outcome:

['1', 'feb', '2', '', '', 'may', '4']

I tried many different things but I really can't edit it to the required outcome.

from itertools import zip_longest
from io import StringIO
import csv

mystr1 = StringIO("""ID Text
1 'good morning'
2 'good afternoon'

3 'good evening'
4
""")

mystr2 = StringIO("""Day Month
14 'Feb'
21
31 'May'
""")

res = []
with mystr1 as f, mystr2 as a:


    reader = csv.reader(f, delimiter=' ')
    reader1 = csv.reader(a, delimiter=' ')

    next(reader)
    next(reader1)

for row in zip_longest(reader, reader1, fillvalue=''):
    var1 = row[0][0] if len(row[0]) else ''
    var2 = row[1][1] if len(row[1]) else ''
    res.extend([var1, var2])

print(res)

This example gives me the following error: Traceback (most recent call last): File "thesis.py", line 31, in <module> var2 = row[1][1] if len(row[1]) else '' IndexError: list index out of range

3
  • Perhaps within your loop you can first check the values for row[0] and row[1] and only if they both exist, then you can update your res variable. Commented May 8, 2018 at 12:40
  • Possible duplicate of zip-like function that pads to longest length? Commented May 8, 2018 at 14:57
  • zip stops at the end of the shortest iterator. You should be using itertools.zip_longest. Commented May 8, 2018 at 14:58

1 Answer 1

4

You can use itertools.filterfalse to remove blank rows. These rows will start with \n and can be identified accordingly.

from itertools import zip_longest
from io import StringIO
import csv

mystr1 = StringIO("""ID Text
1 'good morning'
2 'good afternoon'

3 'good evening'
4
""")

mystr2 = StringIO("""Day Month
14 'Feb'
21
31 'May'
""")

res = []

with mystr1 as f, mystr2 as a:


    reader = csv.reader(f, delimiter=' ')
    reader1 = csv.reader(a, delimiter=' ')

    next(reader)
    next(reader1)

    for row in zip_longest(reader, reader1, fillvalue=''):
        try:
            var1 = row[0][0]
        except IndexError:
            var1 = ''
        try:
            var2 = row[1][1]
        except IndexError:
            var2 = ''
        res.extend([var1, var2])

print(res)

['1', "'Feb'", '2', '', '', "'May'", '3', '', '4', '']
Sign up to request clarification or add additional context in comments.

10 Comments

I copied this exact code but it gave me the same outcome as with the code I had before. I still get ['1', 'feb', '2', '', '', 'may'] , so it still stop reading the rows after there has been one blank row.
@NienkeLuirink, The update might help you. There are lots of tricks you can use: zip_longest to ensure you use the longest of both files, ternary if / else with len to make sure you don't get IndexError, etc.
would probably also be more readable to unpack the output of zip into two separate variables instead of double indexing into an overloaded row
@avigil, Thank you, good point. I think this covers everything OP would want, but still not sure.
@jpp thank you so much for all your help, somehow I'm still getting Traceback (most recent call last): File "new.py", line 30, in <module> res.extend([row[0][0] if len(row[0]) else '', row[1][1] if len(row[1]) else '']) IndexError: list index out of range
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.