Build CSV with headers plus data out of JSON object

Question

I got the following problem with Python.

given the following JSON object - I would like to

read this json as a dict
take several of the keys and put them as a header in a CSV file like so:

CSV headers

firstName,lastName,managersEmail,contractStartsDate

put the corresponding values of those keys inside the CSV as rows like so:

CSV contents

firstName,lastName,managersEmail,contractStartsDate
nameOfPerson,lastNameofPerson,someManager,2000-01-01
nameOfPerson2,lastNameofPerson2,someManager2,2000-02-02

do not duplicate any keys inside the CSV
but put each value from each key out of the JSON inside the CSV under the corresponding header value

my targetJSON.json

data = '{"details":[
{"firstName":"nameOfPerson,"lastName":"lastNameofPerson","managersEmail":"someEmail","managersName":"someManager",
    "departmentName":"someDepartment",
    "position":"somePosition",
    "contractStartsDate":"2000-01-01",
    "contractEndDate":"2000-01-01",
    "company":"someCompany",
    "division":"someDivision",
    "preferredName":"Unknown"},
{"firstName":"nameOfPerson2","lastName":"lastNameofPerson2","managersEmail":"someEmail2","managersName":"someManager2",
    "departmentName":"someDepartment2",
    "position":"somePosition2",
    "contractStartsDate":"2000-02-02",
    "contractEndDate":"2000-02-02",
    "company":"someCompany",
    "division":"someDivision2",
    "preferredName":"Unknown"}
]}'

My code looks like this


with open('targetJSON.json', 'r') as f:
    distros_dict = json.load(f)

for distro in distros_dict:
    print(distro['managersEmail'])


data_file = open("targetJSON.json", "r")
values = json.load(data_file)
data_file.close()

with open("usersData.csv", "wb") as f:
    wr = csv.writer(f)
    for data in values:
        value = data["managersEmail"]
        value = data["firstName"]
        for key, value in data.iteritems():
            #wr.writerow([key, value])
            wr.writerow([key.encode("utf-8"), value.encode("utf-8")])

But the results is complete gibberish, the CSV contains everything mixed up :-(

When you say you want to avoid duplicate keys. Do you mean don't write a row if any of it's keys have been seen before.. Or just don't write duplicate rows? — Zhenhir
– Zhenhir, Commented May 17, 2019 at 13:19
"just don't write duplicate rows" would be correct I think. Basically the JSON contains nested objects with all those keys + values. Per nested object there is 1x firstname 2x lastname (and so on) with the values. This jSOn can get huge with several 1000 nested objefts - each representing some real world object — Oleg
– Oleg, Commented May 17, 2019 at 13:30

Martin Evans · Accepted Answer · 2019-05-20 12:57:44Z

2

You need to use newline="" when using a csv.writer() with Python 3.x, wb is used for Python 2.x versions.

Using the sample JSON you've given, you would just need to iterate over the header fields and create a row from each entry in details. For example:

import json
import csv

data = """{"details":[{"firstName":"nameOfPerson","lastName":"lastNameofPerson","managersEmail":"someEmail",
    "managersName":"someManager", "departmentName":"someDepartment", "position":"somePosition", "contractStartsDate":"2000-01-01",
    "contractEndDate":"2000-01-01", "company":"someCompany", "division":"someDivision", "preferredName":"Unknown"},
{"firstName":"nameOfPerson2","lastName":"lastNameofPerson2","managersEmail":"someEmail2","managersName":"someManager2",
    "departmentName":"someDepartment2", "position":"somePosition2", "contractStartsDate":"2000-02-02",
    "contractEndDate":"2000-02-02", "company":"someCompany", "division":"someDivision2", "preferredName":"Unknown"}
]}"""

json_data = json.loads(data)
header = ["firstName", "lastName", "managersEmail", "contractStartsDate"]

with open("usersData.csv", "w", newline="", encoding="utf-8") as f_output:
    csv_output = csv.writer(f_output)
    csv_output.writerow(header)

    for entry in json_data["details"]:
        csv_output.writerow([entry[key] for key in header])

Giving you:

firstName,lastName,managersEmail,contractStartsDate
nameOfPerson,lastNameofPerson,someEmail,2000-01-01
nameOfPerson2,lastNameofPerson2,someEmail2,2000-02-02

If your JSON data contains duplicate entries, then you would have to first load all of the data and remove duplicates before starting to write the rows.

Alternatively, you could use a csv.DictWriter as follows:

import json
import csv

data = """{"details":[{"firstName":"nameOfPerson","lastName":"lastNameofPerson","managersEmail":"someEmail",
    "managersName":"someManager", "departmentName":"someDepartment", "position":"somePosition", "contractStartsDate":"2000-01-01",
    "contractEndDate":"2000-01-01", "company":"someCompany", "division":"someDivision", "preferredName":"Unknown"},
{"firstName":"nameOfPerson2","lastName":"lastNameofPerson2","managersEmail":"someEmail2","managersName":"someManager2",
    "departmentName":"someDepartment2", "position":"somePosition2", "contractStartsDate":"2000-02-02",
    "contractEndDate":"2000-02-02", "company":"someCompany", "division":"someDivision2", "preferredName":"Unknown"}
]}"""

json_data = json.loads(data)
fieldnames = ["firstName", "lastName", "managersEmail", "contractStartsDate"]

with open("usersData.csv", "w", newline="", encoding="utf-8") as f_output:
    csv_output = csv.DictWriter(f_output, fieldnames=fieldnames, extrasaction="ignore")
    csv_output.writeheader()
    csv_output.writerows(json_data["details"])

To read the data from an input JSON file, you can do the following:

import json
import csv

with open("sourceJSON.json", encoding="utf-8") as f_input:
    json_data = json.load(f_input)

fieldnames = ["firstName", "lastName", "managersEmail", "contractStartsDate"]

with open("usersData.csv", "w", newline="", encoding="utf-8") as f_output:
    csv_output = csv.DictWriter(f_output, fieldnames=fieldnames, extrasaction="ignore")
    csv_output.writeheader()
    csv_output.writerows(json_data["details"])

If you need to remove identical rows, then replace the last line with:

csv_output.writerows(dict(t) for t in {tuple(entry.items()) : '' for entry in json_data["details"]})

edited May 20, 2019 at 12:57

answered May 17, 2019 at 12:40

Martin Evans

46.9k17 gold badges88 silver badges104 bronze badges

Sign up to request clarification or add additional context in comments.

13 Comments

Oleg Over a year ago

Woha! Thank you for this. Going to try it out now and checking my Python versions.. I probably do not need 2.x but fear of removing it..currently my script works on 2.x I guess..and I have to learn more :D

Martin Evans Over a year ago

Your print statement implies you are using Python 3.x, so it should be fine.

Oleg Over a year ago

all working :) I am figuring out this right now: my JSON will be in a .json file and my python script needs to read the file, do its logic then build the csv (which is gonna be read by some other script later) so I need to to read a huge-ass json file - possibly 20 mb big and then make the CSV

Oleg Over a year ago

getting tracebacks :(

Martin Evans Over a year ago

With the script "as is" or have you made changes? Note, your JSON was missing a close " for "nameOfPerson"

|

Collectives™ on Stack Overflow

Build CSV with headers plus data out of JSON object

1 Answer 1

13 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

13 Comments

Your Answer

Sign up or log in

Post as a guest

Related