Python CSV Reader, CSV Formatting

Question

I have a CSV that visually does not look broken. One of the column contains full emails and subsequently additional commas. The format is something like:

ID   | Info   |  Email           | Notes
--------------------------------------------------
1234 | Sample |  Full email here,| More notes here
              |  and email wraps.|
--------------------------------------------------
5678 | Sample2|  Another email,  |  More notes
--------------------------------------------------
9011 | Sample3|  More emails     |  Etc.
--------------------------------------------------

I am using the CSV reader which is outputting each new line as a new row and it is incorrect. For example, I am getting:

Line 1: 1234 | Sample |  Full email here,| More notes here
Line 2:               |  and email wraps.|
Line 3: 5678 | Sample2|  Another email,  |  More notes
Line 4: 9011 | Sample3|  More emails     |  Etc.

I need it to be able to recognize the cell delimiters just as Excel or Libre Office do, and get this:

Line 1: 1234 | Sample |  Full email here, and email wraps.| More notes here
Line 2: 5678 | Sample2|  Another email,  |  More notes
Line 3: 9011 | Sample3|  More emails     |  Etc.

I have this code:

 import csv
 import sys
 csv.field_size_limit(sys.maxsize)
 file = "myfile.csv"
 with open(file, 'rU') as f:
     freader = csv.reader(f, delimiter = '|', quoting=csv.QUOTE_NONE)
     for row in freader:
         print(','.join(row))

I tried delimiter = ',' or delimiter = '\n' but no luck. Any ideas?

Could you please add the actual data in the csv file for the three entries that you're using in your example? — martineau
– martineau, Commented Jan 14, 2014 at 2:51

score 8 · Accepted Answer · 2014-01-14 00:37:15Z

8

CSV stands for comma separated values. While its possible to change the delimiter to tabs, pipes or anything you feel like, the fact of the matter is CSVs are a very raw, line-based format.

The issue lies in your second record, which spans lines which is broken from the perspective of a CSV file. The Python CSV library is not designed to accomodate such things, because that is not in the style of a CSV file.

To do what you are asking, it would be better to write your own parser that breaks each line on the delimter and merges based on some logic. This should be relatively trivial iff the ID column never spans two lines.

As for how to actually write the code, you'll need a process like below:

Initialise array X
Read each line L of file F:
    If the ID field is empty then merge each entry into the previous line L-1
    Otherwise append the line L to array X

edited Jan 14, 2014 at 0:37

answered Jan 14, 2014 at 0:11

user764357

Sign up to request clarification or add additional context in comments.

5 Comments

user1552586 Over a year ago

ID columns never span two lines. How can I parse it, any module for this?

user764357 Over a year ago

@rebHelium You parse it by writing code, and there is no module I know of that does what you want.

user764357 Over a year ago

@rebHelium I've added in some pseudocode that should explain the process you'll need.

user1552586 Over a year ago

Thank you, I'll try to follow and hopefully will make it work.

user1552586 Over a year ago

I created a simple read by line and append to previous using a parameter and startswith() and it does job. Yet the original CSV is not parsing right but your answer was correct.

Collectives™ on Stack Overflow

Python CSV Reader, CSV Formatting

1 Answer 1

5 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

5 Comments

Your Answer

Sign up or log in

Post as a guest

Related