So i have a program that reads json, flattens it and dumps csv:
import json
import unicodecsv as csv
import sys
import glob
import os
from flatten_json import flatten_json
def createcolumnheadings(cols):
#create column headings
columns = cols.keys()
columns = list( set( columns ) )
return columns
doOnce=True
path=os.chdir(sys.argv[1])
for f in glob.glob("smallR.txt"):
fName=os.path.splitext(f)[0]
out_file= open( 'csv/' + fName+'.csv', 'wb' )
csv_w = csv.writer( out_file, delimiter="\t", encoding='utf-8' )
with open(f, 'r') as handle:
for line in handle:
data = json.loads(line)
flatdata =flatten_json(data)
if doOnce:
columns=createcolumnheadings(flatdata)
columns.insert(0,'racism')
csv_w.writerow( columns)
doOnce=False
flatdata['racism']= 0
csv_w.writerow(flatdata.get(x, u'') for x in columns)
This works OK, with one problem. My program just takes the column headings from the first line in smallR.txt (plus it adds a 'Racism' column).
Some of the latter data (smallR.txt here) has different columns. This results in output not quite right, see small.csv here.
Is there an easy way to adapt my program to handle new column headings found on later lines?