pandas read_json for multi line jsons returns a JSONReader and not a dataframe

Question

I have a file file1.json whose contents are like this (each dict in a separate line):

{"a":1,"b":2}
{"c":3,"d":4}
{"e":9,"f":6}
.
.
.
{"u":31,"v":23}
{"w":87,"x":46}
{"y":98,"z":68}

I want to load this file into a pandas dataframe, so this is what i did:

df = pd.read_json('../Dataset/file1.json', orient='columns', lines=True, chunksize=10)

But this instead of returning a dataframe returns a JSONReader.

[IN]: df
[OUT]: <pandas.io.json.json.JsonReader at 0x7f873465bd30>

Is it normal, or am i doing something wrong? And if this is how read_json() is supposed to behave when there're multiple dictionaries in a single json file (without being any comma separated) and with each dict in a separate line, then how can i best fit them into a dataframe?

EDIT: if i remove the chunksize paramter from the read_json() this is what i get:

[IN]: df = pd.read_json('../Dataset/file1.json', orient='columns', lines=True)
[OUT]: ValueError: Expected object or value

that's what chunksize does. see the doc: pandas.pydata.org/pandas-docs/stable/io.html#io-jsonl — njzk2
– njzk2, Commented May 17, 2018 at 5:33
thing is if i don't add the parameter chunksize it gives out an error as ValueError: Expected object or value also it doesn't recognize the file as valid json object as each dictionary is separated by a new line character — Aman Singh
– Aman Singh, Commented May 17, 2018 at 5:37
@AmanSingh It sounds like the problem with your other attempt is that you didn't use lines=True, so you were telling it that you had a single JSON text rather than a file full of line-delimited JSON texts, which isn't true, so it gives you an error. But if that's not it, create a new question. — abarnert
– abarnert, Commented May 17, 2018 at 5:47
The problem does not happen with your sample input. If it happens with your real input, you have to figure out how to give us sample input that causes the same error, or we can't help you. But as I already told you, create a new question for a new problem, don't try to edit all of your problems into one question. — abarnert
– abarnert, Commented May 17, 2018 at 6:00

Community · Accepted Answer · 2020-06-20 09:12:55Z

3

As the docs explain, this is exactly the point of the chunksize parameter:

chunksize: integer, default None

Return JsonReader object for iteration. See the line-delimted json docs for more information on chunksize. This can only be passed if lines=True. If this is None, the file will be read into memory all at once.

The linked docs say:

For line-delimited json files, pandas can also return an iterator which reads in chunksize lines at a time. This can be useful for large files or to read from a stream.

… and then give an example of how to use it.

If you don't want that, why are you passing chunksize? Just leave it out.

edited Jun 20, 2020 at 9:12

CommunityBot

11 silver badge

answered May 17, 2018 at 5:33

abarnert

368k54 gold badges626 silver badges691 bronze badges

Sign up to request clarification or add additional context in comments.

6 Comments

Aman Singh Over a year ago

thing is if i don't add the parameter chunksize it gives out an error as ValueError: Expected object or value

abarnert Over a year ago

@AmanSingh Then you have another error, and the chunksize was just masking it—you don't actually read anything, and therefore don't see the other error, until you for chunk in reader: or similar.

abarnert Over a year ago

@AmanSingh lines=True makes it read JSON lines instead of a single JSON text. chunksize=10 makes it also give you a reader object that reads chunks of 10 lines at a time instead of the whole file. Just throwing random arguments at it until it seems to work isn't going to get you anywhere; read the docs.

abarnert Over a year ago

@AmanSingh Meanwhile, if you need help debugging the other problem this one was masking, create a new question with a minimal reproducible example for that question—the code that uses lines but not chunksize, sample input (ideally something we can copy and paste without removing the ... lines in the middle), and the traceback—and you should get an answer to that one as well.

jezrael Over a year ago

my answer is OK, df = pd.concat(df) working nice in my sample, do you test it?

|

Collectives™ on Stack Overflow

pandas read_json for multi line jsons returns a JSONReader and not a dataframe

1 Answer 1

6 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

6 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related