0

I got the following problem with Python.

given the following JSON object - I would like to

  • read this json as a dict
  • take several of the keys and put them as a header in a CSV file like so:

CSV headers

firstName,lastName,managersEmail,contractStartsDate
  • put the corresponding values of those keys inside the CSV as rows like so:

CSV contents

firstName,lastName,managersEmail,contractStartsDate
nameOfPerson,lastNameofPerson,someManager,2000-01-01
nameOfPerson2,lastNameofPerson2,someManager2,2000-02-02
  • do not duplicate any keys inside the CSV
  • but put each value from each key out of the JSON inside the CSV under the corresponding header value

my targetJSON.json

data = '{"details":[
{"firstName":"nameOfPerson,"lastName":"lastNameofPerson","managersEmail":"someEmail","managersName":"someManager",
    "departmentName":"someDepartment",
    "position":"somePosition",
    "contractStartsDate":"2000-01-01",
    "contractEndDate":"2000-01-01",
    "company":"someCompany",
    "division":"someDivision",
    "preferredName":"Unknown"},
{"firstName":"nameOfPerson2","lastName":"lastNameofPerson2","managersEmail":"someEmail2","managersName":"someManager2",
    "departmentName":"someDepartment2",
    "position":"somePosition2",
    "contractStartsDate":"2000-02-02",
    "contractEndDate":"2000-02-02",
    "company":"someCompany",
    "division":"someDivision2",
    "preferredName":"Unknown"}
]}'

My code looks like this


with open('targetJSON.json', 'r') as f:
    distros_dict = json.load(f)

for distro in distros_dict:
    print(distro['managersEmail'])


data_file = open("targetJSON.json", "r")
values = json.load(data_file)
data_file.close()

with open("usersData.csv", "wb") as f:
    wr = csv.writer(f)
    for data in values:
        value = data["managersEmail"]
        value = data["firstName"]
        for key, value in data.iteritems():
            #wr.writerow([key, value])
            wr.writerow([key.encode("utf-8"), value.encode("utf-8")])

But the results is complete gibberish, the CSV contains everything mixed up :-(

2
  • When you say you want to avoid duplicate keys. Do you mean don't write a row if any of it's keys have been seen before.. Or just don't write duplicate rows? Commented May 17, 2019 at 13:19
  • "just don't write duplicate rows" would be correct I think. Basically the JSON contains nested objects with all those keys + values. Per nested object there is 1x firstname 2x lastname (and so on) with the values. This jSOn can get huge with several 1000 nested objefts - each representing some real world object Commented May 17, 2019 at 13:30

1 Answer 1

2

You need to use newline="" when using a csv.writer() with Python 3.x, wb is used for Python 2.x versions.

Using the sample JSON you've given, you would just need to iterate over the header fields and create a row from each entry in details. For example:

import json
import csv

data = """{"details":[{"firstName":"nameOfPerson","lastName":"lastNameofPerson","managersEmail":"someEmail",
    "managersName":"someManager", "departmentName":"someDepartment", "position":"somePosition", "contractStartsDate":"2000-01-01",
    "contractEndDate":"2000-01-01", "company":"someCompany", "division":"someDivision", "preferredName":"Unknown"},
{"firstName":"nameOfPerson2","lastName":"lastNameofPerson2","managersEmail":"someEmail2","managersName":"someManager2",
    "departmentName":"someDepartment2", "position":"somePosition2", "contractStartsDate":"2000-02-02",
    "contractEndDate":"2000-02-02", "company":"someCompany", "division":"someDivision2", "preferredName":"Unknown"}
]}"""

json_data = json.loads(data)
header = ["firstName", "lastName", "managersEmail", "contractStartsDate"]

with open("usersData.csv", "w", newline="", encoding="utf-8") as f_output:
    csv_output = csv.writer(f_output)
    csv_output.writerow(header)

    for entry in json_data["details"]:
        csv_output.writerow([entry[key] for key in header])

Giving you:

firstName,lastName,managersEmail,contractStartsDate
nameOfPerson,lastNameofPerson,someEmail,2000-01-01
nameOfPerson2,lastNameofPerson2,someEmail2,2000-02-02

If your JSON data contains duplicate entries, then you would have to first load all of the data and remove duplicates before starting to write the rows.


Alternatively, you could use a csv.DictWriter as follows:

import json
import csv

data = """{"details":[{"firstName":"nameOfPerson","lastName":"lastNameofPerson","managersEmail":"someEmail",
    "managersName":"someManager", "departmentName":"someDepartment", "position":"somePosition", "contractStartsDate":"2000-01-01",
    "contractEndDate":"2000-01-01", "company":"someCompany", "division":"someDivision", "preferredName":"Unknown"},
{"firstName":"nameOfPerson2","lastName":"lastNameofPerson2","managersEmail":"someEmail2","managersName":"someManager2",
    "departmentName":"someDepartment2", "position":"somePosition2", "contractStartsDate":"2000-02-02",
    "contractEndDate":"2000-02-02", "company":"someCompany", "division":"someDivision2", "preferredName":"Unknown"}
]}"""

json_data = json.loads(data)
fieldnames = ["firstName", "lastName", "managersEmail", "contractStartsDate"]

with open("usersData.csv", "w", newline="", encoding="utf-8") as f_output:
    csv_output = csv.DictWriter(f_output, fieldnames=fieldnames, extrasaction="ignore")
    csv_output.writeheader()
    csv_output.writerows(json_data["details"])

To read the data from an input JSON file, you can do the following:

import json
import csv

with open("sourceJSON.json", encoding="utf-8") as f_input:
    json_data = json.load(f_input)

fieldnames = ["firstName", "lastName", "managersEmail", "contractStartsDate"]

with open("usersData.csv", "w", newline="", encoding="utf-8") as f_output:
    csv_output = csv.DictWriter(f_output, fieldnames=fieldnames, extrasaction="ignore")
    csv_output.writeheader()
    csv_output.writerows(json_data["details"])

If you need to remove identical rows, then replace the last line with:

csv_output.writerows(dict(t) for t in {tuple(entry.items()) : '' for entry in json_data["details"]})
Sign up to request clarification or add additional context in comments.

13 Comments

Woha! Thank you for this. Going to try it out now and checking my Python versions.. I probably do not need 2.x but fear of removing it..currently my script works on 2.x I guess..and I have to learn more :D
Your print statement implies you are using Python 3.x, so it should be fine.
all working :) I am figuring out this right now: my JSON will be in a .json file and my python script needs to read the file, do its logic then build the csv (which is gonna be read by some other script later) so I need to to read a huge-ass json file - possibly 20 mb big and then make the CSV
getting tracebacks :(
With the script "as is" or have you made changes? Note, your JSON was missing a close " for "nameOfPerson"
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.