Reading a CSV in Python where input lines are split

Question

I have a kind of CSV file where the input for one logical line may be split over multiple physical lines.

Data example:

":T1","A1","B1","C1"
":T2","A2","B2","C2",
"D2","E2"
":T3","A3","B3","C3",
"D3"
":T4","A4"

This is four logical lines, with the continuation denoted by the trailing comma on the end of lines which split.

I tried to use the csv module in python:

import csv
with open('2.dat','r') as csvfile:
        datreader = csv.reader(csvfile, delimiter=',' , quotechar='"')
        for row in datreader:
                print (', '.join(row))
                print ("*******************************")

Which gives:

:T1, A1, B1, C1
*******************************
:T2, A2, B2, C2,
*******************************
D2, E2
*******************************
:T3, A3, B3, C3,
*******************************
D3
*******************************
:T4, A4
*******************************

What I'd like:

:T1, A1, B1, C1
*******************************
:T2, A2, B2, C2, D2, E2
*******************************
:T3, A3, B3, C3, D3
*******************************
:T4, A4
*******************************

I'm unsure of the best way to use csv module to parse this data correctly. Input data set could be millions of rows.

you should have mentioned that each logical line should start with :T<number> — RomanPerekhrest
– RomanPerekhrest, Commented Oct 17, 2017 at 8:02

jlandercy · Accepted Answer · 2017-10-17 08:18:53Z

2

One way is first correct your file to match the CSV standard and then parse it.

Based on your trial data:

data = """
":T1","A1","B1","C1"
":T2","A2","B2","C2",
"D2","E2"
":T3","A3","B3","C3",
"D3"
":T4","A4"
""".strip('\n')

A simple regexp can merge split lines:

import re
parsed = re.sub(r',\n', ",", data)
print(parsed)

It returns:

":T4","A4"
":T1","A1","B1","C1"
":T2","A2","B2","C2","D2","E2"
":T3","A3","B3","C3","D3"
":T4","A4"

Which complies with CSV standard and can be easily parsed.

answered Oct 17, 2017 at 8:18

jlandercy

11.6k3 gold badges48 silver badges73 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

RomanPerekhrest · Accepted Answer · 2017-10-17 08:46:19Z

1

Another "game" with end param of print function:

import csv

with open('.2dat', 'r') as f:
    reader = csv.reader(f)
    for i,r in enumerate(reader):
        if r[0].startswith(':T'):
            if i > 0: print('\n','*'*30, sep='')
            print(', '.join(r), end='')
        else:
            print(', '.join(r), end='')

The output:

:T1, A1, B1, C1
******************************
:T2, A2, B2, C2, D2, E2
******************************
:T3, A3, B3, C3, D3
******************************
:T4, A4

answered Oct 17, 2017 at 8:46

RomanPerekhrest

93.1k4 gold badges75 silver badges112 bronze badges

Comments

adder · Accepted Answer · 2017-10-19 05:24:53Z

0

This should do it:

import csv

with open('file.csv', 'r') as f:
    reader = csv.reader(f, delimiter=',', quotechar='"')
    rows = []

    for row in reader:
        current_row = []
        if row[0].startswith(':'):
            current_row.append(row)
            rows.append(current_row)
        else:
            rows[-1].append(row)

    for row in rows:
        print(', '.join(y for x in row for y in x if y))

:T1, A1, B1, C1
:T2, A2, B2, C2, D2, E2
:T3, A3, B3, C3, D3
:T4, A4

edited Oct 19, 2017 at 5:24

answered Oct 17, 2017 at 8:30

adder

3,7482 gold badges24 silver badges34 bronze badges

Comments

Jean-François Corbett · Accepted Answer · 2017-10-17 08:10:33Z

0

I don't think there's a magical CSV parser that will do exactly what you want. You'll have to do a very small amount of work yourself.

Make an new empty list of lines. Loop through the lines in datreader. If a line starts with :, append it to the new list. If it doesn't, concatenate it with the last line in the new list.

edited Oct 17, 2017 at 8:10

answered Oct 17, 2017 at 8:05

Jean-François Corbett

38.7k30 gold badges145 silver badges192 bronze badges

Collectives™ on Stack Overflow

Reading a CSV in Python where input lines are split

4 Answers 4

Comments

Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

Comments

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related