0

I have a json file of size less than 1Gb.I am trying to read the file on a server that have 400 Gb RAM using the following simple command:

df = pd.read_json('filepath.json')

However this code is taking forever (several hours) to execute,I tried several suggestions such as

df = pd.read_json('filepath.json', low_memory=False)

or

df = pd.read_json('filepath.json', lines=True)

But none have worked. How come reading 1GB file into a server of 400GB be so slow?

6
  • 2
    Did you try import json; d=json.load(open('filepath.json')); df=pd.DataFrame(d)? Commented Feb 24, 2022 at 14:14
  • Is your json essentially a list of dictionaries? Is it one dictionary per line? Do you need all the attributes or just some of them? Commented Feb 24, 2022 at 14:57
  • Even though pandas.read_json is not fast, I don't think it will take several hours (It's just a wild guess). I suspect that your table has too many columns, or pandas.read_json is reading it that way. pandas is terrible at handling tables with too many columns. For example, pd.DataFrame([range(100000)]) will take more than one second to create. Please check how many rows and columns your table has. Commented Feb 24, 2022 at 16:30
  • Thanks I think the problem was with reading directly using read_json. while @tomerar suggestion worked in few seconds! Commented Feb 24, 2022 at 19:23
  • 1
    @Youcef, what was the solution that from @tomera? Commented Dec 15, 2022 at 23:21

1 Answer 1

1

You can use Chunking can shrink memory use. I recommend Dask Library can load data in parallel.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.