In pandas combine outer json with nested json and create new dataframe

Question

json_str = '[
{
"name": "t1",
"props": [
  {
    "abc": 10012,
    "def": "OBJECT"
  },
  {
    "abc": 999123,
    "def": "SUBJECT"
  }
],
"id": 1,
"title": "king"
},
{
"name": "t2",
"props": [
  {
    "abc": 789456,
    "def": "PRODUCT"
  }
],
"id": 2,
"title": "queen"
}
]'

Using above JSON, I want to create one dataframe that expands the props list and concats to main json columns.

In the end end I want to end up with these columns in df:

id,title,name,abc,def

With rows:

1,king,t1,10012,OBJECT

1,king,t1,999123,SUBJECT

2,queen,t2,789456,PRODUCT

When I try this:

jdata = json.loads(json_str)
pd.concat([pd.DataFrame(jdata), pd.DataFrame(list(jdata['props']))], axis=1).drop('props', 1)

I get this error:

list indices must be integers or slices, not str

Also tried this:

jdata=json.loads(json_str)
pd.concat([pd.DataFrame(jdata), pd.DataFrame([pd.json_normalize(jdata, "props", errors="ignore", record_prefix="")])], axis=1).drop('props', 1)

throws this error:

Must pass 2-d input. shape={values.shape}

Also tried this:

result = pd.json_normalize(jdata, 'props', errors="ignore", record_prefix="props.")
result2 = pd.json_normalize(jdata, errors="ignore", record_prefix="tmpl.")
df = pd.concat([result, result2], axis=1).drop('props', 1)

No error thrown here, but the concat doesn't line up the two df's. The rows are out of sync.

Thanks for any help.

is it a string or json data?

sammywemmy
– sammywemmy

2021-04-21 01:02:00 +00:00
Commented Apr 21, 2021 at 1:02 — sammywemmy
– sammywemmy, Commented Apr 21, 2021 at 1:02
it's json data after json.loads(json_str)

A.G.
– A.G.

2021-04-21 01:03:05 +00:00
Commented Apr 21, 2021 at 1:03 — A.G.
– A.G., Commented Apr 21, 2021 at 1:03

sammywemmy · Accepted Answer · 2021-04-21 01:22:18Z

3

You could use json_normalize to simplify the extraction; for each record_path there will be an associated meta:

json_normalize(data = jdata, 
               record_path = 'props', 
               meta = ['name', 'id', 'title']
              )
 
      abc      def name id  title
0   10012   OBJECT   t1  1   king
1  999123  SUBJECT   t1  1   king
2  789456  PRODUCT   t2  2  queen

answered Apr 21, 2021 at 1:22

sammywemmy

28.9k4 gold badges21 silver badges35 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

sacuL Over a year ago

Much better than my (now deleted) answer! +1

A.G. Over a year ago

I like you deleted answer and it works perfectly!! The thing with this one is again it refers to property names. If I could upvote deleted answers, I would :)

sammywemmy Over a year ago

yes, @sacuL, I think you should undelete it, since that is what the OP prefers

sacuL · Accepted Answer · 2021-04-21 01:19:03Z

1

I think that pd.json_normalize is probably the way to go, with a couple minor tweaks: first explode the props column to get one row per value in the array, and then use apply(pd.Series) to turn the dictionaries into their own columns:

# I think you already did this, but start by turning the str into proper json
>>> jdata = json.loads(json_str)
>>> result = pd.json_normalize(jdata).explode("props")   
>>> result[["abc", "def"]] = result.props.apply(pd.Series) 
>>> df = result[["id", "title", "name", "abc", "def"]]

>>> df

   id  title name     abc      def
0   1   king   t1   10012   OBJECT
0   1   king   t1  999123  SUBJECT
1   2  queen   t2  789456  PRODUCT

Edit: As per your comment, you can change things around a bit to make it work without having to explicitly refer to the columns, except for props:

>>> jdata = json.loads(json_str)
>>> result = pd.json_normalize(jdata).explode("props")   
>>> result2 = result.pop("props").apply(pd.Series)
>>> df = pd.concat([result, result2], axis=1)

  name  id  title     abc      def
0   t1   1   king   10012   OBJECT
0   t1   1   king  999123  SUBJECT
1   t2   2  queen  789456  PRODUCT

edited Apr 21, 2021 at 1:19

answered Apr 21, 2021 at 1:10

sacuL

51.6k9 gold badges88 silver badges115 bronze badges

2 Comments

A.G. Over a year ago

Thanks for your reply, much appreciated! This is good, but is there a way to do this without referring to attribute names, except may the "props" attribute?

sacuL Over a year ago

@A.G. I tried to do that in an edit, is that along the lines of what you were thinking?

Collectives™ on Stack Overflow

In pandas combine outer json with nested json and create new dataframe

2 Answers 2

3 Comments

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

3 Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related