Reading formatted text using python

Question

I would like to use python read and write files of the following format:

#h -F, field1 field2 field3
a,b,c
d,e,f
# some comments
g,h,i

This file closely resembles a typical CSV, except for the following:

The header line starts with #h
The second element of the header line is a tag to denote the delimiter
The remaining elements of the header are field names (always separated by a single space)
Comment lines always start with # and can be scattered throughout the file

Is there any way I can use csv.DictReader() and csv.DictWriter() to read and write these files?

Have you tried subclassing the existing classes and adding the extra behaviour? — Ian Gilham
– Ian Gilham, Commented Feb 7, 2012 at 14:55

unutbu · Accepted Answer · 2012-02-07 18:24:28Z

You can parse the first line separately to find the delimiter and fieldnames:

    firstline = next(f).split()
    delimiter = firstline[1][-1]
    fields = firstline[2:]

Note that csv.DictReader can take any iterable as its first argument. So to skip the comments, you can wrap f in an iterator (skip_comments) which yields only non-comment lines:

import csv
def skip_comments(iterable):
    for line in iterable:
        if not line.startswith('#'):
            yield line

with open('data.csv','rb') as f:
    firstline = next(f).split()
    delimiter = firstline[1][-1]
    fields = firstline[2:]
    for line in csv.DictReader(skip_comments(f),
                               delimiter = delimiter, fieldnames = fields):
        print line

On the data you posted this yields

{'field2': 'b', 'field3': 'c', 'field1': 'a'}
{'field2': 'e', 'field3': 'f', 'field1': 'd'}
{'field2': 'h', 'field3': 'i', 'field1': 'g'}

To write a file in this format, you could use a header helper function:

def header(delimiter,fields):
    return '#h -F{d} {f}\n'.format(d = delimiter, f=' '.join(fields))

with open('data.csv', 'rb') as f:
    with open('output.csv', 'wb') as g:
        firstline = next(f).split()
        delimiter = firstline[1][-1]
        fields = firstline[2:]
        writer = csv.DictWriter(g, delimiter = delimiter, fieldnames = fields)
        g.write(header(delimiter,fields))
        for row in csv.DictReader(skip_comments(f),
                                   delimiter = delimiter, fieldnames = fields):
            writer.writerow(row)
            g.write('# comment\n')

Note that you can write to output.csv using g.write (for header or comment lines) or writer.writerow (for csv).

Nice. Now suppose I want to write to a file using this quasi-CSV format (i.e. using the four peculiarities mentioned in the question). How would I use csv.DictWriter to do that?

Fred Foo · Accepted Answer · 2012-02-07 14:56:06Z

0

Assume the input file is opened as input. First, read in the header:

header = input.readline()

Parse out the delimiter and field names and use that to construct a DictReader. Now, instead of input, feed the reader the expression

(ln for ln in input where ln[0] != '#')

to skip the comments.

answered Feb 7, 2012 at 14:56

Fred Foo

365k80 gold badges765 silver badges852 bronze badges

Collectives™ on Stack Overflow

Reading formatted text using python

2 Answers 2

1 Comment

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related