1

I have a directory full of JSON files that I need to extract information from and convert into a Pandas dataframe. My current solution works, but I have a feeling that there is a more elegant way of doing this:

for entry in os.scandir(directory):
    if entry.path.endswith(".json"):
        with open(entry.path) as f:
            data = json.load(f)
            ...
            newline = field1 + ',' + field2 + ',' + ... +  ',' + fieldn
            output.append(newline)
...
df = pd.read_csv(io.StringIO('\n'.join(output)))
4
  • where is data being used? Commented Jun 16, 2021 at 5:41
  • 2
    why not read each json into dataframe and combine all these dataframes into one big dataframe Commented Jun 16, 2021 at 5:43
  • 1
    yeah I'd check out pd.read_json and pd.concat Commented Jun 16, 2021 at 5:49
  • data is the source for all the values (field1, field2, ...) that I need to store in a df. I basically convert data into a comma separated value string to be used later. Commented Jun 16, 2021 at 5:51

1 Answer 1

4

Yes, this can be done better.

import os
import pandas as pd
from glob import glob

all_files = glob(os.path.join(path, "*.json"))
ind_df = (pd.read_json(f) for f in all_files)
df = pd.concat(ind_df, ignore_index=True)

Using generators will save a lot of computation and memory.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.