2

I am splitting a CSV file based on a column with dates into separate files. However, some rows do contain a date but the others cells are empty. I want to remove these rows that contain empty cells from the CSV. But I'm not sure how to do this.

Here's is my code:

csv.field_size_limit(sys.maxsize)

with open(main_file, "r") as fp:
    root = csv.reader(fp, delimiter='\t', quotechar='"')
    result = collections.defaultdict(list)
    next(root)
    for row in root:
        year = row[0].split("-")[0]
        result[year].append(row)

for i,j in result.items():
    row_count = sum(1 for row in j)
        print(row_count)
        file_path = "%s%s-%s.csv"%(src_path, i, row_count)
        with open(file_path, 'w') as fp:
            writer = csv.writer(fp, delimiter='\t', quotechar='"')
            writer.writerows(j)
3
  • I know of its existence, I've never used it and maybe it's a good time to start ;) Commented Jan 9, 2016 at 12:42
  • Why not a simple test before result[year].append(row) that checks there is data in the other fields, e.g. if row[1]: result[year].append(row)? Commented Jan 9, 2016 at 12:42
  • This does not work because result is already a dictionary. Error msg:IndexError: list index out of range Commented Jan 9, 2016 at 14:03

3 Answers 3

6

Pandas is perfect for this, especially if you want this to be easily adjusted to, say, other file formats. Of course one could consider it an overkill. To just remove rows with empty cells:

>>> import pandas as pd
>>> data = pd.read_csv('example.csv', sep='\t')
>>> print data
   A   B   C
0   1   2  5
1 NaN   1  9
2   3   4  4
>>> data.dropna()
   A   B   C
0   1   2  5
2   3   4  4
>>> data.dropna().to_csv('example_clean.csv')

I leave performing the splitting and saving into separate files using pandas as an exercise to start learning this great package if you want :)

Sign up to request clarification or add additional context in comments.

Comments

0

This would skip all all rows with at least one empty cell:

with open(main_file, "r") as fp:
    ....
    for row in root:
         if not all(map(len, row)):
              continue

2 Comments

Can you show some example csv content with empty cells.
This file is not \t separated. In fact, its doesn't seem to be csv file at all.
0

Pandas is Best in Python for handling any type of data processing.For help you can go through on link :- http://pandas.pydata.org/pandas-docs/stable/10min.html

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.