Construct DataFrame from multiple JSON files

Question

I'm using pandas to convert multiple json files into a dataframe. I only want some entries that match some criteria from those files, but I'm appending the whole converted files, then filtering it.

Suppose I have 2 json files that look like this:

File 1500.json

[
  {
    "CodStore": 1500,
    "CodItem": 10,
    "NameItem": "Burger",
    "Price": 10.0
  },
  {
    "CodStore": 1500,
    "CodItem": 20,
    "NameItem": "Fries",
    "Price": 3.0
  },
  {
    "CodStore": 1500,
    "CodItem": 30,
    "NameItem": "Ice Cream",
    "Price": 1.0
  }
]

File 1805.json

[
  {
    "CodStore": 1805,
    "CodItem": 10,
    "NameItem": "Burger",
    "Price": 9.0
  },
  {
    "CodStore": 1805,
    "CodItem": 20,
    "NameItem": "Fries",
    "Price": 2.0
  },
  {
    "CodStore": 1805,
    "CodItem": 30,
    "NameItem": "Ice Cream",
    "Price": 0.5
  }
]

I only want entries with CodItem 10 and 30 on my dataframe, so my python code looks like this:

from pandas import DataFrame, read_json

df = DataFrame()

stores = [1500, 1805]

for store in stores:
    filename = '%s.json' % store
    df = df.append(read_json(filename))

df = df[(df.CodItem == 10) | (df.CodItem == 30)]

This is just an example, the problem is that I have more than 600+ json files so reading takes a lot of time, the dataframe becomes very long and memory consumption is very high.

Is there a way to read only the matching criteria to the dataframe?

cs95 · Accepted Answer · 2019-06-06 15:21:12Z

2

One option would be to append your JSON data to a list, then convert once at the end and filter.

coditems = [10, 30]

data = []
for filename in json_files:
    data.extend(read_json(filename))

df = pd.DataFrame(data).query('CodItem in @coditems')

This should be a lot faster because append is a quadratic operation. You have to read all the data in anyway, so you may as well use pandas to speed it up.

Another option would be to initialise your DataFrames inside a loop and then call pd.concat after you're done.

df_list = []
for file in json_files:
    df_list.append(pd.DataFrame.from_records(read_json(filename)))

df = pd.concat(df_list, ignore_index=True).query('CodItem in @coditems')

answered Jun 6, 2019 at 15:21

cs95

406k106 gold badges744 silver badges797 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

foglerit · Accepted Answer · 2019-06-06 15:19:42Z

0

You can can create a temporary data frame within your loop and filter it before appending:

from pandas import DataFrame, read_json

df = DataFrame()

stores = [1500, 1805]

for store in stores:
    filename = '%s.json' % store
    temp_df = read_json(filename)
    df = df.append(temp_df[(temp_df.CodItem == 10) | (temp_df.CodItem == 30)])

answered Jun 6, 2019 at 15:19

foglerit

8,4369 gold badges52 silver badges70 bronze badges

Collectives™ on Stack Overflow

Construct DataFrame from multiple JSON files

2 Answers 2

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related