0

I am calling an API and it returns JSON string. I want to convert it to CSV-format so I can save it later to database. However, JSON objects keys cause problems because there is keys missing or keys are changing. I wrote this python script but because of keys I cannot get it to work:

import json
import csv

with open('custom.json') as json_file:
    data = json.load(json_file)

custom_data = data['CustomJSON']
data_file = open('data_file.csv', 'w')
csv_writer = csv.writer(data_file)
count = 0

for i in custom_data:
    if count == 0:
        # Writing headers of CSV file
        header = i.keys()
        csv_writer.writerow(header)
        count += 1
    # Writing data of CSV file
    csv_writer.writerow(i.values())
data_file.close()

How I can convert this type of JSON to CSV? Example JSON message:

{
    "CustomJSON" : [
    {
     "id" : "1,
      "name" : "Jack",
      "surname" : "Bauer"
    },
    {
      "id" : "2",
      "name" : "John",
      "surname" : "Smith"
      "age" : "31",
      "city" : "New York"
    },
    {
      "id" : "3",
      "name" : "Matt",
      "surname" : "Secret"
      "exception_1" : "Exception_1",
      "exception_2" : "Exception_2"
      "date" : "2022-02-08"
    }
  ]
}

Should I try to loop all key-values first somehow and then later try to add data? Can anyone provide an example?

1
  • 1
    I tried loading your json, and it's not valid. If you provide actual, pasted code, we can help you much more quickly and easily. Commented Feb 8, 2022 at 8:52

2 Answers 2

1

As you are reading a single JSON string, you will have everything in memory. So IMHO the simplest way is to first build the field names list, and then write everything to a csv file.

# compute the fieldnamelist
# this uses a dict because it is easy to update it while maintaining key order
keys = dict()
for d in data['CustomJSON']:
    keys.update(d)

# write to the csv file
# this uses a DictWriter because the individual rows are already dicts
with open('data_file.csv', 'w', newline='') as data_file
    csv_writer = csv.DictWriter(data_file, fieldnames = keys.keys())
    _ = csv_writer.writeheader()
    _ = csv_writer.writerows(data['CustomJSON'])

With your data it gives as expected:

id,name,surname,age,city,exception_1,exception_2,date
1,Jack,Bauer,,,,,
2,John,Smith,31,New York,,,
3,Matt,Secret,,,Exception_1,Exception_2,2022-02-08
Sign up to request clarification or add additional context in comments.

3 Comments

If the OP wants a fulsome header row in the output file, then it's not merely your opinion that the data will need to be preprocessed first to determine what they are — it's a certitude. The amount of data needed to store the keys dictionary could be minimized by using keys.update(dict.fromkeys(d.keys())) which would strip the values off.
@martineau: If rows were expected to come from a large large number of big files, I would have suggested to directly use a database, to be able to add new columns after reading a new file. The sqlite module would make it easy...
Putting the data in a database first would also be a form of preprocessing, would it not? That's the point I was trying to make.
0

I am a pandas fan(atic) so I'd do something like

import pandas as pd

# df is a pandas dataframe)
df = pd.read_json('http://data.com/foo')
df.to_csv('foo.csv')

Pandas has options for the CSV dialect, if need be. You should be able to do what you describe with those two function calls, though.

1 Comment

Suggesting that someone download and install a large module, and then learn how to use it in order to solve this relatively trivial task is not a good suggestion IMO.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.