2

I have a CSV file with a bunch of columns. A few of the columns are the same, but I want to convert them to a JSON object where they all live under the same array.

For example in the CSV:

firstname,lastname,pet,pet,pet
Joe, Dimaggio, dog, cat
Pete, Rose, turtle, cat
Jackie, Robinson, dog

I want the JSON to be

{ firstname: Joe,
  lastname: Dimaggio,
  pets: ["dog", "cat"]
},
{ firstname: Pete,
  lastname: Rose,
  pets: ["turtle", "cat"]
},
{ firstname: Jackie,
  lastname: Robinson,
  pets: ["dog"]
}

I'm trying to write a simple Python script to do this but I'm running into problems.

Here's what I've got so far:

import csv
import json

csvfile = open('userdata.csv', 'r')
jsonfile = open('userdata.json', 'w')

fieldnames = ("firstname", "lastname", "pet", "pet", "pet");
reader = csv.DictReader( csvfile, fieldnames)
record = {}
for row in reader:
    record['firstname'] = row['firstname']
    record['lastname'] = row['lastname']
    record['pets'] = json.JSONEncoder().encode({"pets": [row['pet'], row['pet'], row['pet'], row['pet'], row['pet']]});
    json.dump(record, jsonfile, indent=4)
    ##json.dump(json.loads(json.JSONEncoder(record)), jsonfile, indent=4)
print "something worked"

But this is acting funny since it's printing pets as an array inside an object called pets.

I can't figure out how to get the array pets outside the object `pets. Also it's adding backslashes to the array items

{
    "firstname": "Joe",
    "lastname": "Dimaggio", 
    "pets": "{\"pets\": [\"dog\", \"cat\"]}"
}

1 Answer 1

7

It is because you are encoding it and then using json.dumps which is basically encoding it twice. Remove json.JSONEncoder().encode(...) and it should work correctly.

import csv
import json

csvfile = open('userdata.csv', 'r')
jsonfile = open('userdata.json', 'w')

fieldnames = ("firstname", "lastname", "pet", "pet", "pet");
reader = csv.DictReader( csvfile, fieldnames)
record = {}
for row in reader:
    record['firstname'] = row['firstname']
    record['lastname'] = row['lastname']
    record['pets'] = [[row['pet'], row['pet'], row['pet'], row['pet'], row['pet']]
    # Remove blank entries
    record['pets'] = [x for x in record['pets'] if x is not '']
    json.dumps(record, jsonfile, indent=4)
print "something worked"

The backslashes you saw were from escaping the json string, a result of serializing it twice.

Sign up to request clarification or add additional context in comments.

4 Comments

ah awesome. So if JSON.dumps encodes it, what's the point of having another encode?
I believe json.dumps is simply a function that uses the JSONEncoder class behind the scenes. In this case, you only needed to use one to get the job done.
also since it's just a minor question, can you edit in your answer how I can avoid the pet columns that are blank? I don't want to get in my array pets: ["dog", "", ""]. I just want it to have "dog"
Yep! record['pets'] = [x for x in record['pets'] if x is not '']

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.