Adding column headers to csv file using python, reading json

Question

So i have a program that reads json, flattens it and dumps csv:

import json
import unicodecsv as csv
import sys
import glob
import os
from flatten_json import flatten_json

def createcolumnheadings(cols):
    #create column headings
    columns = cols.keys()
    columns = list( set( columns ) )
    return columns

doOnce=True

path=os.chdir(sys.argv[1])

for f in glob.glob("smallR.txt"):
    fName=os.path.splitext(f)[0]
    out_file= open( 'csv/' + fName+'.csv', 'wb' )
    csv_w = csv.writer( out_file, delimiter="\t", encoding='utf-8'  )

    with open(f, 'r') as handle:
        for line in handle:   
            data = json.loads(line)        
            flatdata =flatten_json(data)             
            if doOnce:
                columns=createcolumnheadings(flatdata) 
                columns.insert(0,'racism')
                csv_w.writerow( columns)                
                doOnce=False
            flatdata['racism']= 0
            csv_w.writerow(flatdata.get(x, u'') for x in columns)

This works OK, with one problem. My program just takes the column headings from the first line in smallR.txt (plus it adds a 'Racism' column).

Some of the latter data (smallR.txt here) has different columns. This results in output not quite right, see small.csv here.

Is there an easy way to adapt my program to handle new column headings found on later lines?

Javier · Accepted Answer · 2016-10-28 14:06:25Z

1

In that case you need to scan the whole file first, in order to get all the possible columns:

with open(f, 'r') as handle:
    data = [json.loads(line) for line in handle]

columns = ['racism'] + list({k for entry in data for k in entry.keys()})

csv_w.writerow(columns)
for entry in entries:
    csv_w.writerow(entry.get(c, '') for c in columns)

This loads all data in memory. If this is not acceptable to you, you might read the file twice: one to get the columns, another to read and write:

with open(f, 'r') as handle:
    columns = ['racism'] + list({k for line in handle for k in json.load(line).keys()})
csv_w.write(columns)

with open(f, 'r') as handle:
    for line in handle:
        entry = json.loads(line)
        csv_w.write(entry.get(c, '') for c in columns)

The flatten_json function definition is missing so I can only guess what it does.

edited Oct 28, 2016 at 14:06

answered Oct 28, 2016 at 14:00

Javier

1718 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

schoon Over a year ago

Thanks Javier, the files are huge so I will give your second method a go. Flatten_json is an import from here

Collectives™ on Stack Overflow

Adding column headers to csv file using python, reading json

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related