1

I have a CSV that visually does not look broken. One of the column contains full emails and subsequently additional commas. The format is something like:

ID   | Info   |  Email           | Notes
--------------------------------------------------
1234 | Sample |  Full email here,| More notes here
              |  and email wraps.|
--------------------------------------------------
5678 | Sample2|  Another email,  |  More notes
--------------------------------------------------
9011 | Sample3|  More emails     |  Etc.
--------------------------------------------------

I am using the CSV reader which is outputting each new line as a new row and it is incorrect. For example, I am getting:

Line 1: 1234 | Sample |  Full email here,| More notes here
Line 2:               |  and email wraps.|
Line 3: 5678 | Sample2|  Another email,  |  More notes
Line 4: 9011 | Sample3|  More emails     |  Etc.

I need it to be able to recognize the cell delimiters just as Excel or Libre Office do, and get this:

Line 1: 1234 | Sample |  Full email here, and email wraps.| More notes here
Line 2: 5678 | Sample2|  Another email,  |  More notes
Line 3: 9011 | Sample3|  More emails     |  Etc.

I have this code:

 import csv
 import sys
 csv.field_size_limit(sys.maxsize)
 file = "myfile.csv"
 with open(file, 'rU') as f:
     freader = csv.reader(f, delimiter = '|', quoting=csv.QUOTE_NONE)
     for row in freader:
         print(','.join(row))

I tried delimiter = ',' or delimiter = '\n' but no luck. Any ideas?

2
  • Could you please add the actual data in the csv file for the three entries that you're using in your example? Commented Jan 14, 2014 at 2:51
  • Sorry, is confidential. Commented Jan 14, 2014 at 4:56

1 Answer 1

8

CSV stands for comma separated values. While its possible to change the delimiter to tabs, pipes or anything you feel like, the fact of the matter is CSVs are a very raw, line-based format.

The issue lies in your second record, which spans lines which is broken from the perspective of a CSV file. The Python CSV library is not designed to accomodate such things, because that is not in the style of a CSV file.

To do what you are asking, it would be better to write your own parser that breaks each line on the delimter and merges based on some logic. This should be relatively trivial iff the ID column never spans two lines.

As for how to actually write the code, you'll need a process like below:

Initialise array X
Read each line L of file F:
    If the ID field is empty then merge each entry into the previous line L-1
    Otherwise append the line L to array X
Sign up to request clarification or add additional context in comments.

5 Comments

ID columns never span two lines. How can I parse it, any module for this?
@rebHelium You parse it by writing code, and there is no module I know of that does what you want.
@rebHelium I've added in some pseudocode that should explain the process you'll need.
Thank you, I'll try to follow and hopefully will make it work.
I created a simple read by line and append to previous using a parameter and startswith() and it does job. Yet the original CSV is not parsing right but your answer was correct.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.