7

I would like to use python read and write files of the following format:

#h -F, field1 field2 field3
a,b,c
d,e,f
# some comments
g,h,i

This file closely resembles a typical CSV, except for the following:

  1. The header line starts with #h
  2. The second element of the header line is a tag to denote the delimiter
  3. The remaining elements of the header are field names (always separated by a single space)
  4. Comment lines always start with # and can be scattered throughout the file

Is there any way I can use csv.DictReader() and csv.DictWriter() to read and write these files?

2

2 Answers 2

8

You can parse the first line separately to find the delimiter and fieldnames:

    firstline = next(f).split()
    delimiter = firstline[1][-1]
    fields = firstline[2:]

Note that csv.DictReader can take any iterable as its first argument. So to skip the comments, you can wrap f in an iterator (skip_comments) which yields only non-comment lines:

import csv
def skip_comments(iterable):
    for line in iterable:
        if not line.startswith('#'):
            yield line

with open('data.csv','rb') as f:
    firstline = next(f).split()
    delimiter = firstline[1][-1]
    fields = firstline[2:]
    for line in csv.DictReader(skip_comments(f),
                               delimiter = delimiter, fieldnames = fields):
        print line

On the data you posted this yields

{'field2': 'b', 'field3': 'c', 'field1': 'a'}
{'field2': 'e', 'field3': 'f', 'field1': 'd'}
{'field2': 'h', 'field3': 'i', 'field1': 'g'}

To write a file in this format, you could use a header helper function:

def header(delimiter,fields):
    return '#h -F{d} {f}\n'.format(d = delimiter, f=' '.join(fields))

with open('data.csv', 'rb') as f:
    with open('output.csv', 'wb') as g:
        firstline = next(f).split()
        delimiter = firstline[1][-1]
        fields = firstline[2:]
        writer = csv.DictWriter(g, delimiter = delimiter, fieldnames = fields)
        g.write(header(delimiter,fields))
        for row in csv.DictReader(skip_comments(f),
                                   delimiter = delimiter, fieldnames = fields):
            writer.writerow(row)
            g.write('# comment\n')

Note that you can write to output.csv using g.write (for header or comment lines) or writer.writerow (for csv).

Sign up to request clarification or add additional context in comments.

1 Comment

Nice. Now suppose I want to write to a file using this quasi-CSV format (i.e. using the four peculiarities mentioned in the question). How would I use csv.DictWriter to do that?
0

Assume the input file is opened as input. First, read in the header:

header = input.readline()

Parse out the delimiter and field names and use that to construct a DictReader. Now, instead of input, feed the reader the expression

(ln for ln in input where ln[0] != '#')

to skip the comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.