5

I have a file file1.json whose contents are like this (each dict in a separate line):

{"a":1,"b":2}
{"c":3,"d":4}
{"e":9,"f":6}
.
.
.
{"u":31,"v":23}
{"w":87,"x":46}
{"y":98,"z":68}

I want to load this file into a pandas dataframe, so this is what i did:

df = pd.read_json('../Dataset/file1.json', orient='columns', lines=True, chunksize=10)

But this instead of returning a dataframe returns a JSONReader.

[IN]: df
[OUT]: <pandas.io.json.json.JsonReader at 0x7f873465bd30>

Is it normal, or am i doing something wrong? And if this is how read_json() is supposed to behave when there're multiple dictionaries in a single json file (without being any comma separated) and with each dict in a separate line, then how can i best fit them into a dataframe?

EDIT: if i remove the chunksize paramter from the read_json() this is what i get:

[IN]: df = pd.read_json('../Dataset/file1.json', orient='columns', lines=True)
[OUT]: ValueError: Expected object or value
7
  • that's what chunksize does. see the doc: pandas.pydata.org/pandas-docs/stable/io.html#io-jsonl Commented May 17, 2018 at 5:33
  • thing is if i don't add the parameter chunksize it gives out an error as ValueError: Expected object or value also it doesn't recognize the file as valid json object as each dictionary is separated by a new line character Commented May 17, 2018 at 5:37
  • 1
    @AmanSingh It sounds like the problem with your other attempt is that you didn't use lines=True, so you were telling it that you had a single JSON text rather than a file full of line-delimited JSON texts, which isn't true, so it gives you an error. But if that's not it, create a new question. Commented May 17, 2018 at 5:47
  • @AmanSingh - are data confidental? Commented May 17, 2018 at 5:53
  • The problem does not happen with your sample input. If it happens with your real input, you have to figure out how to give us sample input that causes the same error, or we can't help you. But as I already told you, create a new question for a new problem, don't try to edit all of your problems into one question. Commented May 17, 2018 at 6:00

1 Answer 1

3

As the docs explain, this is exactly the point of the chunksize parameter:

chunksize: integer, default None

Return JsonReader object for iteration. See the line-delimted json docs for more information on chunksize. This can only be passed if lines=True. If this is None, the file will be read into memory all at once.

The linked docs say:

For line-delimited json files, pandas can also return an iterator which reads in chunksize lines at a time. This can be useful for large files or to read from a stream.

… and then give an example of how to use it.

If you don't want that, why are you passing chunksize? Just leave it out.

Sign up to request clarification or add additional context in comments.

6 Comments

thing is if i don't add the parameter chunksize it gives out an error as ValueError: Expected object or value
@AmanSingh Then you have another error, and the chunksize was just masking it—you don't actually read anything, and therefore don't see the other error, until you for chunk in reader: or similar.
@AmanSingh lines=True makes it read JSON lines instead of a single JSON text. chunksize=10 makes it also give you a reader object that reads chunks of 10 lines at a time instead of the whole file. Just throwing random arguments at it until it seems to work isn't going to get you anywhere; read the docs.
@AmanSingh Meanwhile, if you need help debugging the other problem this one was masking, create a new question with a minimal reproducible example for that question—the code that uses lines but not chunksize, sample input (ideally something we can copy and paste without removing the ... lines in the middle), and the traceback—and you should get an answer to that one as well.
my answer is OK, df = pd.concat(df) working nice in my sample, do you test it?
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.