Remove row from CSV that contains empty cell using Python

Question

I am splitting a CSV file based on a column with dates into separate files. However, some rows do contain a date but the others cells are empty. I want to remove these rows that contain empty cells from the CSV. But I'm not sure how to do this.

Here's is my code:

csv.field_size_limit(sys.maxsize)

with open(main_file, "r") as fp:
    root = csv.reader(fp, delimiter='\t', quotechar='"')
    result = collections.defaultdict(list)
    next(root)
    for row in root:
        year = row[0].split("-")[0]
        result[year].append(row)

for i,j in result.items():
    row_count = sum(1 for row in j)
        print(row_count)
        file_path = "%s%s-%s.csv"%(src_path, i, row_count)
        with open(file_path, 'w') as fp:
            writer = csv.writer(fp, delimiter='\t', quotechar='"')
            writer.writerows(j)

I know of its existence, I've never used it and maybe it's a good time to start ;) — Melvin Wevers
– Melvin Wevers, Commented Jan 9, 2016 at 12:42
Why not a simple test before result[year].append(row) that checks there is data in the other fields, e.g. if row[1]: result[year].append(row)? — AChampion
– AChampion, Commented Jan 9, 2016 at 12:42
This does not work because result is already a dictionary. Error msg:IndexError: list index out of range — Melvin Wevers
– Melvin Wevers, Commented Jan 9, 2016 at 14:03

Phlya · Accepted Answer · 2016-01-09 13:00:44Z

6

Pandas is perfect for this, especially if you want this to be easily adjusted to, say, other file formats. Of course one could consider it an overkill. To just remove rows with empty cells:

>>> import pandas as pd
>>> data = pd.read_csv('example.csv', sep='\t')
>>> print data
   A   B   C
0   1   2  5
1 NaN   1  9
2   3   4  4
>>> data.dropna()
   A   B   C
0   1   2  5
2   3   4  4
>>> data.dropna().to_csv('example_clean.csv')

I leave performing the splitting and saving into separate files using pandas as an exercise to start learning this great package if you want :)

answered Jan 9, 2016 at 13:00

Phlya

6,0065 gold badges40 silver badges56 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Mike Müller · Accepted Answer · 2016-01-09 12:47:57Z

0

This would skip all all rows with at least one empty cell:

with open(main_file, "r") as fp:
    ....
    for row in root:
         if not all(map(len, row)):
              continue

answered Jan 9, 2016 at 12:47

Mike Müller

86k21 gold badges174 silver badges165 bronze badges

2 Comments

Mike Müller Over a year ago

Can you show some example csv content with empty cells.

Mike Müller Over a year ago

This file is not \t separated. In fact, its doesn't seem to be csv file at all.

Developer · Accepted Answer · 2016-01-09 13:50:04Z

0

Pandas is Best in Python for handling any type of data processing.For help you can go through on link :- http://pandas.pydata.org/pandas-docs/stable/10min.html

answered Jan 9, 2016 at 13:50

Developer

1131 gold badge3 silver badges11 bronze badges

Collectives™ on Stack Overflow

Remove row from CSV that contains empty cell using Python

3 Answers 3

Comments

2 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

2 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related