Converting a large CSV file to multiple JSON files using Python

Question

I am currently using the following code to convert a large CSV file to a JSON file.

import csv 
import json 

def csv_to_json(csvFilePath, jsonFilePath):
    jsonArray = []
      
    with open(csvFilePath, encoding='utf-8') as csvf: 
        csvReader = csv.DictReader(csvf) 

        for row in csvReader: 
            jsonArray.append(row)
    with open(jsonFilePath, 'w', encoding='utf-8') as jsonf: 
        jsonString = json.dumps(jsonArray, indent=4)
        jsonf.write(jsonString)
          
csvFilePath = r'test_data.csv'
jsonFilePath = r'test_data.json'
csv_to_json(csvFilePath, jsonFilePath)

This code works fine and I am able to convert the CSV to JSON without any issues. However, as the CSV file contains 600,000+ rows and hence as many items in my JSON, it has become very difficult to manage the JSON file.

I would like to modify my above code such that for every 5000 rows of the CSV, the data is written into a new JSON file. Ideally, I would be having 120 (600,000/5000) JSON files in this case.

How can I do the same?

Icebreaker454 · Accepted Answer · 2021-02-16 11:34:46Z

3

Split up your read\write methods and add a simple threshold:

JSON_ENTRIES_THRESHOLD = 5000  # modify to whatever you see suitable

def write_json(json_array, filename):
    with open(filename, 'w', encoding='utf-8') as jsonf: 
        json.dump(json_array, jsonf)  # note the usage of .dump directly to a file descriptor

def csv_to_json(csvFilePath, jsonFilePath):
    jsonArray = []

    with open(csvFilePath, encoding='utf-8') as csvf: 
        csvReader = csv.DictReader(csvf) 
        filename_index = 0
    
        for row in csvReader:
            jsonArray.append(row)
            if len(jsonArray) >= JSON_ENTRIES_THRESHOLD:
                # if we reached the treshold, write out
                write_json(jsonArray, f"jsonFilePath-{filename_index}.json")
                filename_index += 1
                jsonArray = []
            
        # Finally, write out the remainder
        write_json(jsonArray, f"jsonFilePath-{filename_index}.json")

edited Feb 16, 2021 at 11:34

answered Feb 16, 2021 at 11:28

Icebreaker454

1,0918 silver badges12 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

gionni Over a year ago

Like the answer, but for the sake of readability, I would put the write_json function inside the if condition, instead of using continue

Icebreaker454 Over a year ago

Yup, that would seem better

Icebreaker454 Over a year ago

@gionni, I also fixed the overwriting issue- all the files were recorded with the same name.

Collectives™ on Stack Overflow

Converting a large CSV file to multiple JSON files using Python

1 Answer 1

3 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

3 Comments

Your Answer

Sign up or log in

Post as a guest

Related