Writing multiple header lines in pandas.DataFrame.to_csv

Question

I am putting my data into NASA's ICARTT format for archvival. This is a comma-separated file with multiple header lines, and has commas in the header lines. Something like:

46, 1001
lastname, firstname
location
instrument
field mission
1, 1
2011, 06, 21, 2012, 02, 29
0
Start_UTC, seconds, number_of_seconds_from_0000_UTC
14
1, 1
-999, -999
measurement name, units
measurement name, units
column1 label, column2 label, column3 label, column4 label, etc.

I have to make a separate file for each day that data were collected, so I will end up creating around thirty files in all. When I create a csv file via pandas.DataFrame.to_csv I cannot (as far as I know) simply write the header lines to the file before writing the data, so I have had to trick it to doing what I want via

# assuming <df> is a pandas dataframe
df.to_csv('dst.ict',na_rep='-999',header=True,index=True,index_label=header_lines)

where "header_lines" is the header string

What this give me is exactly what I want, except "header_lines" is bracketed by double-quotes. Is there any way to write text to the head of a csv file using to_csv or remove the double quotes? I have already tried setting quotechar='' and doublequote=False in to_csv(), but the double quotes still come up.

What I am doing now (and it works for now, but I would like to move to something better) is simply opening a file via open('dst.ict','w') and printing to that line by line, which is quite slow.

ndt · Accepted Answer · 2014-12-24 09:23:56Z

You can, indeed, just write the header lines before the data. pandas.DataFrame.to_csv takes a path_or_buf as its first argument, not just a pathname:

pandas.DataFrame.to_csv(path_or_buf, *args, **kwargs)

path_or_buf : string or file handle, default None

File path or object, if None is provided the result is returned as a string.

Here's an example:

#!/usr/bin/python2

import pandas as pd
import numpy as np
import sys

# Make an example data frame.
df = pd.DataFrame(np.random.randint(100, size=(5,5)),
                  columns=['a', 'b', 'c', 'd', 'e'])

header = '\n'.join(
    # I like to make sure the header lines are at least utf8-encoded.
    [unicode(line, 'utf8') for line in 
        [ '1001',
        'Daedalus, Stephen',
        'Dublin, Ireland',
        'Keys',
        'MINOS',
        '1,1',
        '1904,06,16,1922,02,02',
        'time_since_8am', # Ends up being the header name for the index.
        ]
    ]
)

with open(sys.argv[1], 'w') as ict:
    # Write the header lines, including the index variable for
    # the last one if you're letting Pandas produce that for you.
    # (see above).
    for line in header:
        ict.write(line)

    # Just write the data frame to the file object instead of
    # to a filename. Pandas will do the right thing and realize
    # it's already been opened.
    df.to_csv(ict)

The result is just what you wanted - to write the header lines, and then call .to_csv() and write that:

$ python example.py test && cat test
1001
Daedalus, Stephen
Dublin, Ireland
Keys to the tower
MINOS
1, 1
1904, 06, 16, 1922, 02, 02
time_since_8am,a,b,c,d,e
0,67,85,66,18,32
1,47,4,41,82,84
2,24,50,39,53,13
3,49,24,17,12,61
4,91,5,69,2,18

Sorry if this is too late to be useful. I work in archiving these files (and use Python), so feel free to drop me a line if you have future questions.

Not too late at all, since I can update my code now! Why I did not realize I could pass the buffer value (I must have read that 100 times) I don't know. Thank you for pointing that out to me!

Felix · Accepted Answer · 2021-02-23 20:15:00Z

2

Even though it's still some years and ndt's answer is quite nice, another possibility would be to write the header first and then use to_csv() with mode='a' (append):

# write the header
header = '46, 1001\nlastname, firstname\n,...'
with open('test.csv', 'w') as fp
    fp.write(header)

# write the rest
df.to_csv('test.csv', header=True, mode='a')

It's maybe less effective due to the two write operations, though...

answered Feb 23, 2021 at 20:15

Felix

4866 silver badges11 bronze badges

1 Comment

tnknepp Over a year ago

I expect the performance difference to be minimal. The biggest difference is you open/close the file twice, while ndt's solution only did this once. We're not dealing with huge files here, so this shouldn't matter. However, it doesn't make sense to close the file just so I can append to it. Your solution works, of course, it just involves unnecessary steps. Good suggestion though!

Collectives™ on Stack Overflow

Writing multiple header lines in pandas.DataFrame.to_csv

2 Answers 2

1 Comment

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related