How to ignore empty values in csv file and continue in Python

Question

I have two example csv files, csvexample.csv looks like this:

ID Text  
1  'good morning'  
2  'good afternoon'  
3  'good evening'

While csvexample1.csv looks like this:

Day Month  
14  'Feb'  
21  'Mar'  
31  'May'

With the following code, I get the result that I want, which is to add the first column of csvexample.csv and the second column of csvexample1.csv to one list; res.

import csv

res = []
with open('csvexample.csv') as f, open('csvexample1.csv') as a:
    reader=csv.reader(f) 
    reader1=csv.reader(a)
    next(reader)
    next(reader1)
    for row in zip(reader, reader1):
        res.extend([row[0][0], row[1][1]])  

print(res)

I get the following outcome:

['1', 'Feb', '2', 'Mar', '3', 'May']

However, the actual csv files I want to apply this code to contain some empty cells, seeing as I am adding the Twitter bio from companies from one file and the Tweets of those companies from another file into one list, but some companies do not have a bio on Twitter so those cells in a specific column are empty. Furthermore, in most cases the first file has much less rows than the second file, but the outcome then seems to stop when the first file has no rows left and ignores all the other rows in the second file. For example, if I edit csvexample.csv like this:

ID Text   
1  'good morning'  
2  'good afternoon'   

3  'good evening'  
4

and csvexmple1.csv like this:

Day Month  
14  'Feb'  
21     
31  'May'

I get the following outcome:

['1', 'feb', '2', '', '', 'may']

instead of the desired outcome:

['1', 'feb', '2', '', '', 'may', '4']

I tried many different things but I really can't edit it to the required outcome.

from itertools import zip_longest
from io import StringIO
import csv

mystr1 = StringIO("""ID Text
1 'good morning'
2 'good afternoon'

3 'good evening'
4
""")

mystr2 = StringIO("""Day Month
14 'Feb'
21
31 'May'
""")

res = []
with mystr1 as f, mystr2 as a:


    reader = csv.reader(f, delimiter=' ')
    reader1 = csv.reader(a, delimiter=' ')

    next(reader)
    next(reader1)

for row in zip_longest(reader, reader1, fillvalue=''):
    var1 = row[0][0] if len(row[0]) else ''
    var2 = row[1][1] if len(row[1]) else ''
    res.extend([var1, var2])

print(res)

This example gives me the following error: Traceback (most recent call last): File "thesis.py", line 31, in <module> var2 = row[1][1] if len(row[1]) else '' IndexError: list index out of range

Perhaps within your loop you can first check the values for row[0] and row[1] and only if they both exist, then you can update your res variable. — Lix
– Lix, Commented May 8, 2018 at 12:40
Possible duplicate of zip-like function that pads to longest length? — avigil
– avigil, Commented May 8, 2018 at 14:57
zip stops at the end of the shortest iterator. You should be using itertools.zip_longest. — avigil
– avigil, Commented May 8, 2018 at 14:58

jpp · Accepted Answer · 2018-05-17 13:51:23Z

4

You can use itertools.filterfalse to remove blank rows. These rows will start with \n and can be identified accordingly.

from itertools import zip_longest
from io import StringIO
import csv

mystr1 = StringIO("""ID Text
1 'good morning'
2 'good afternoon'

3 'good evening'
4
""")

mystr2 = StringIO("""Day Month
14 'Feb'
21
31 'May'
""")

res = []

with mystr1 as f, mystr2 as a:


    reader = csv.reader(f, delimiter=' ')
    reader1 = csv.reader(a, delimiter=' ')

    next(reader)
    next(reader1)

    for row in zip_longest(reader, reader1, fillvalue=''):
        try:
            var1 = row[0][0]
        except IndexError:
            var1 = ''
        try:
            var2 = row[1][1]
        except IndexError:
            var2 = ''
        res.extend([var1, var2])

print(res)

['1', "'Feb'", '2', '', '', "'May'", '3', '', '4', '']

edited May 17, 2018 at 13:51

answered May 8, 2018 at 12:45

jpp

166k37 gold badges301 silver badges362 bronze badges

Sign up to request clarification or add additional context in comments.

10 Comments

Nienke Luirink Over a year ago

I copied this exact code but it gave me the same outcome as with the code I had before. I still get ['1', 'feb', '2', '', '', 'may'] , so it still stop reading the rows after there has been one blank row.

jpp Over a year ago

@NienkeLuirink, The update might help you. There are lots of tricks you can use: zip_longest to ensure you use the longest of both files, ternary if / else with len to make sure you don't get IndexError, etc.

avigil Over a year ago

would probably also be more readable to unpack the output of zip into two separate variables instead of double indexing into an overloaded row

jpp Over a year ago

@avigil, Thank you, good point. I think this covers everything OP would want, but still not sure.

Nienke Luirink Over a year ago

@jpp thank you so much for all your help, somehow I'm still getting

Traceback (most recent call last):   File "new.py", line 30, in <module>     res.extend([row[0][0] if len(row[0]) else '', row[1][1] if len(row[1]) else '']) IndexError: list index out of range

|

Collectives™ on Stack Overflow

How to ignore empty values in csv file and continue in Python

1 Answer 1

10 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

10 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related