parsing text file with JSON-like object into CSV

Question

I have a text file containing key-value pairs, with the last two key-value pairs containing JSON-like objects that I would like to split out into columns and write with the other values, using the keys as column headings. The first three rows of the data file input.txt look like this:

InnerDiameterOrWidth::0.1,InnerHeight::0.1,Length2dCenterToCenter::44.6743867864386,Length3dCenterToCenter::44.6768028159989,Tag::<NULL>,{StartPoint::7858.35924983374[%2C]1703.69341358077[%2C]-3.075},{EndPoint::7822.85045874375[%2C]1730.80294308742[%2C]-3.53962362760298}
InnerDiameterOrWidth::0.1,InnerHeight::0.1,Length2dCenterToCenter::57.8689351603823,Length3dCenterToCenter::57.8700464193429,Tag::<NULL>,{StartPoint::7793.52927597915[%2C]1680.91224357457[%2C]-3.075},{EndPoint::7822.85045874375[%2C]1730.80294308742[%2C]-3.43363070193163}
InnerDiameterOrWidth::0.1,InnerHeight::0.1,Length2dCenterToCenter::68.7161350545728,Length3dCenterToCenter::68.7172034962765,Tag::<NULL>,{StartPoint::7858.35924983374[%2C]1703.69341358077[%2C]-3.075},{EndPoint::7793.52927597915[%2C]1680.91224357457[%2C]-3.45819643838485}

and we eventually came up with something that worked, but there must be a much better way:

import csv
with open('input.txt', 'rb') as fin, open('output.csv', 'wb') as fout:
    reader = csv.reader(fin)
    writer = csv.writer(fout)
    for i, line in enumerate(reader):
        mysplit = [item.split('::') for item in line if item.strip()]
        if not mysplit: # blank line
            continue
        keys, vals = zip(*mysplit)
        start_vals = [item.split('[%2C]') for item in mysplit[-2]]
        end_vals = [item.split('[%2C]') for item in mysplit[-1]]
        a=list(keys[0:-2])
        a.extend(['start1','start2','start3','end1','end2','end3'])
        b=list(vals[0:-2])
        b.append(start_vals[1][0])
        b.append(start_vals[1][1])
        b.append(start_vals[1][2][:-1])
        b.append(end_vals[1][0])
        b.append(end_vals[1][1])
        b.append(end_vals[1][2][:-1])
        if i == 0:
            # if first line: write header
            writer.writerow(a)
        writer.writerow(b)

which produces the output file output.csv that looks like this

InnerDiameterOrWidth,InnerHeight,Length2dCenterToCenter,Length3dCenterToCenter,Tag,start1,start2,start3,end1,end2,end3
0.1,0.1,44.6743867864386,44.6768028159989,<NULL>,7858.35924983374,1703.69341358077,-3.075,7822.85045874375,1730.80294308742,-3.53962362760298
0.1,0.1,57.8689351603823,57.8700464193429,<NULL>,7793.52927597915,1680.91224357457,-3.075,7822.85045874375,1730.80294308742,-3.43363070193163
0.1,0.1,68.7161350545728,68.7172034962765,<NULL>,7858.35924983374,1703.69341358077,-3.075,7793.52927597915,1680.91224357457,-3.45819643838485

We don't want to write code like this in the future.

What is the best way to read data like this?

There is nothing JSON-like about that input format. The only think remotely related are the curly braces and commas, but there the comparison ends. — Martijn Pieters
– Martijn Pieters, Commented Mar 4, 2013 at 21:10
I think someone asked a question just like this a couple days ago, I'll try to find it. Or maybe this is the continuation of that question? Edit: (stackoverflow.com/questions/15190260/…) — djhoese
– djhoese, Commented Mar 4, 2013 at 21:25

Martijn Pieters · Accepted Answer · 2013-03-04 21:41:58Z

1

I'd use:

from itertools import chain
import csv

_header_translate = {
    'StartPoint': ('start1', 'start2', 'start3'),
    'EndPoint': ('end1', 'end2', 'end3')
}

def header(col):
    header = col.strip('{}').split('::', 1)[0]
    return _header_translate.get(header, (header,))

def cleancolumn(col):
    col = col.strip('{}').split('::', 1)[1]
    return col.split('[%2C]')

def chainedmap(func, row):
    return list(chain.from_iterable(map(func, row)))

with open('input.txt', 'rb') as fin, open('output.csv', 'wb') as fout:
    reader = csv.reader(fin)
    writer = csv.writer(fout)
    for i, row in enumerate(reader):
        if not i:  # first row, write header first
            writer.writerow(chainedmap(header, row))
        writer.writerow(chainedmap(cleancolumn, row))

The cleancolumn method takes any of your columns and returns a tuple (possibly with only one value) after removing the braces, removing everything before the first :: and splitting on the embedded 'comma'. By using itertools.chain.from_iterable() we turn the series of tuples generated from the columns into one list again for the csv writer.

When handling the first line we generate one header row from the same columns, replacing the StartPoint and EndPoint headers with the 6 expanded headers.

edited Mar 4, 2013 at 21:41

answered Mar 4, 2013 at 21:30

Martijn Pieters

1.1m326 gold badges4.2k silver badges3.4k bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

djhoese Over a year ago

Nice answer. Any reason you prefer 'if not i' over 'if i == 0'? To me, you're not asking "does it exist?", you're checking if it is the value 0.

Martijn Pieters Over a year ago

@daveydave400: numeric 0 is always false in Python. So are empty sequences and collections (dict, set, list, tuple, string, etc.). It's easier and cleaner in my eyes.

djhoese Over a year ago

I know that it's accurate and that empty sequences evaluate to false, I just personally prefer the "== X" when actually checking for a value and was wondering if you had another reason besides preference. When I check for empty sequences I use the "not X", if checking for None "is None" or "is not None".

Martijn Pieters Over a year ago

@daveydave400: There is a different reason to explicitly test for None; you may want to allow 0 but not None. Here there is no ambiguity; enumerate() starts at 0.

Rich Signell Over a year ago

Martijn, I learned so much from your solution. This is the best thing about SO. Plus it works! Thanks so much. -Rich

Collectives™ on Stack Overflow

parsing text file with JSON-like object into CSV

1 Answer 1

5 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

5 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related