0

I have a kind of CSV file where the input for one logical line may be split over multiple physical lines.

Data example:

":T1","A1","B1","C1"
":T2","A2","B2","C2",
"D2","E2"
":T3","A3","B3","C3",
"D3"
":T4","A4"

This is four logical lines, with the continuation denoted by the trailing comma on the end of lines which split.

I tried to use the csv module in python:

import csv
with open('2.dat','r') as csvfile:
        datreader = csv.reader(csvfile, delimiter=',' , quotechar='"')
        for row in datreader:
                print (', '.join(row))
                print ("*******************************")

Which gives:

:T1, A1, B1, C1
*******************************
:T2, A2, B2, C2,
*******************************
D2, E2
*******************************
:T3, A3, B3, C3,
*******************************
D3
*******************************
:T4, A4
*******************************

What I'd like:

:T1, A1, B1, C1
*******************************
:T2, A2, B2, C2, D2, E2
*******************************
:T3, A3, B3, C3, D3
*******************************
:T4, A4
*******************************

I'm unsure of the best way to use csv module to parse this data correctly. Input data set could be millions of rows.

1
  • 2
    you should have mentioned that each logical line should start with :T<number> Commented Oct 17, 2017 at 8:02

4 Answers 4

2

One way is first correct your file to match the CSV standard and then parse it.

Based on your trial data:

data = """
":T1","A1","B1","C1"
":T2","A2","B2","C2",
"D2","E2"
":T3","A3","B3","C3",
"D3"
":T4","A4"
""".strip('\n')

A simple regexp can merge split lines:

import re
parsed = re.sub(r',\n', ",", data)
print(parsed)

It returns:

":T4","A4"
":T1","A1","B1","C1"
":T2","A2","B2","C2","D2","E2"
":T3","A3","B3","C3","D3"
":T4","A4"

Which complies with CSV standard and can be easily parsed.

Sign up to request clarification or add additional context in comments.

Comments

1

Another "game" with end param of print function:

import csv

with open('.2dat', 'r') as f:
    reader = csv.reader(f)
    for i,r in enumerate(reader):
        if r[0].startswith(':T'):
            if i > 0: print('\n','*'*30, sep='')
            print(', '.join(r), end='')
        else:
            print(', '.join(r), end='')

The output:

:T1, A1, B1, C1
******************************
:T2, A2, B2, C2, D2, E2
******************************
:T3, A3, B3, C3, D3
******************************
:T4, A4

Comments

0

This should do it:

import csv

with open('file.csv', 'r') as f:
    reader = csv.reader(f, delimiter=',', quotechar='"')
    rows = []

    for row in reader:
        current_row = []
        if row[0].startswith(':'):
            current_row.append(row)
            rows.append(current_row)
        else:
            rows[-1].append(row)

    for row in rows:
        print(', '.join(y for x in row for y in x if y))

:T1, A1, B1, C1
:T2, A2, B2, C2, D2, E2
:T3, A3, B3, C3, D3
:T4, A4

Comments

0

I don't think there's a magical CSV parser that will do exactly what you want. You'll have to do a very small amount of work yourself.

Make an new empty list of lines. Loop through the lines in datreader. If a line starts with :, append it to the new list. If it doesn't, concatenate it with the last line in the new list.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.