0

currently working with jsonl files and I intend to convert it into CSV format to run it through a program. However, I realize that it would be better to convert it from json directly to CSV instead, and I wrote a code below to convert json to csv. However, I am unsure on how I can convert my current jsonl files into the desired json format before I can run this code. If anyone has any solutions for me, do let me know! Thanks so much for the read and appreciate all the help that I can get.

(FYI I tried to convert the jsonl file directly using the json to csv converter below and I get an error message below:)

Converting to CSV: XXX.jsonl
ERROR: Extra data

This is the conversion code, I hope it helps!

from json.decoder import JSONDecodeError
import pandas as pd
import sys
from flatten_json import flatten
import json

def main():
    if len(sys.argv) not in [1, 2]:
        sys.exit("Usage: python JsonCon.py [FILENAME.json] \n exitted")

    filename = sys.argv[1]
    print(f"Converting to CSV: {filename}")
    convertFile(filename)

def convertFile(filename):
    try:
        with open(filename) as file:
            jsString = json.load(file)
            dic_flat = [flatten(d) for d in jsString]
            df = pd.DataFrame(dic_flat)
            df.to_csv(f'{filename[:-5]}.csv')
    except JSONDecodeError as e:
        print(f'ERROR: {e.msg}')

if __name__ == "__main__":
    main()
2
  • jsonl is basically a sequence of json, one for every lines. So just cicle over each line of jsonl and apply your script. Then concatenates the results. to_csv with None parameter will return a string. Commented Jun 3, 2021 at 7:48
  • Hi @Yuri, any idea on how to do that? I'm just starting off with python and I'm really hoping I could get a little guidance along the way. Thank you so much! Commented Jun 3, 2021 at 13:06

1 Answer 1

1
import json
import csv
import io

# get the JSON objects from JSONL
jsonl_data = """{"a": 1, "b": 123}\n{"a": 2, "b": 234}\n{"a": 3, "b": 345}\n"""
json_lines = tuple(json_line
                   for json_line in jsonl_data.splitlines()
                   if json_line.strip())
jsons_objs = tuple(json.loads(json_line)
                   for json_line in json_lines)

# write them into a CSV file
fake_file = io.StringIO()
writer = csv.writer(fake_file)
writer.writerow(["a", "b"])
writer.writerows((value for key, value in sorted(json_obj.items()))
                 for json_obj in jsons_objs)
print(fake_file.getvalue())
Sign up to request clarification or add additional context in comments.

2 Comments

jsons_as_list is missing
@Yuri Indeed, I fixed my answer.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.