3

I have a .csv file which I would like to convert into a .jsonl file.

I found the Pandas to_json method:

df = pd.read_csv('DIRECTORY/texts1.csv', sep=';')
df.to_json ('DIRECTORY/texts1.json')

However, I am not aware of a function to turn it into a .jsonl format. How can I do this?

5
  • jsonlines.org Commented May 7, 2021 at 13:34
  • As I said, a lot of attempts to hijack a common practice. Just append the JSON strings at the end of the file you want. That;s the whole point. You only need to read to the next newline to read a JSON document instead of reading the entire file. Commented May 7, 2021 at 13:35
  • In fact, ndjson.org appeared before jsonlines.org and contained the same text as the historical json.org site, without having any relation to either Douglas Crockford or ECMA Commented May 7, 2021 at 13:39
  • The whole point of storing a JSON document per line is that you don't have to read either the document or the data in memory. It's the same benefit CSV has. You can read the CSV file line-by-line, generate a JSON string from each line, and just append it to the target file. This way you could handle eg a 10GB without using any more data than necessary to process and serialize a single line. Commented May 7, 2021 at 13:42
  • 1
    From this answer you can see that to_json can write each row in a separate row if you use orient='records', lines=True. From to_json docs: If ‘orient’ is ‘records’ write out line delimited json format. Will throw ValueError if incorrect ‘orient’ since others are not list like. Commented May 7, 2021 at 13:47

3 Answers 3

5

This is probably a bit late, but I wrote a silly module called csv-jsonl that may help with this sort of thing.

>>> from csv_jsonl import JSONLinesDictWriter
>>> l = [{"foo": "bar", "bat": 1}, {"foo": "bar", "bat": 2}]
>>> with open("foo.jsonl", "w", encoding="utf-8") as _fh:
...     writer = JSONLinesDictWriter(_fh)
...     writer.writerows(l)
...

It extends the native csv module, so it's mostly familiar. Hope it helps.

Sign up to request clarification or add additional context in comments.

Comments

3

I'm not sure if this result is compliant with "jsonl" syntax, but it's a hack that might get towards a relevant outcome.

The primary trick is to treat each line of the input file as a separate JSON file upon export, then read that JSON back in from disk and treat as distinct jsonl lines.

I'm starting from a CSV that contains

hello, from, this, file
another, amazing, line, csv
last, line, of, file

The snippet below builds on another post.

import pandas
df = pandas.read_csv("myfile.csv", header=None)

file_to_write = ""
for index in df.index:
    df.loc[index].to_json("row{}.json".format(index))
    with open("row{}.json".format(index)) as file_handle:
        file_content = file_handle.read()
        file_to_write += file_content + "\n"
        
with open("result.jsonl","w") as file_handle:
    file_handle.write(file_to_write)

The resulting .jsonl file contains

{"0":"hello","1":" from","2":" this","3":" file"}
{"0":"another","1":" amazing","2":" line","3":" csv"}
{"0":"last","1":" line","2":" of","3":" file"}

If the row indices are not desired, those could be removed from the .to_json() line of the Python snippet above.

Comments

1

Thought this should be added. This version builds upon Ben's answer but avoids using temporary files and optimizes string handling, thus addressing the potential inefficiencies and issues in the original script.

import pandas as pd

# Reading the CSV file into a DataFrame without headers
df = pd.read_csv("myfile.csv", header=None)

# Prepare an empty list to collect JSON strings
json_lines = []

# Convert each row to a JSON string and append it to the list
for index, row in df.iterrows():
    json_str = row.to_json(orient='records')
    json_lines.append(json_str)

# Join all JSON strings with newline characters and write to a single file
with open("result.jsonl", "w") as file_handle:
    file_handle.write("\n".join(json_lines))

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.