3
json_str = '[
{
"name": "t1",
"props": [
  {
    "abc": 10012,
    "def": "OBJECT"
  },
  {
    "abc": 999123,
    "def": "SUBJECT"
  }
],
"id": 1,
"title": "king"
},
{
"name": "t2",
"props": [
  {
    "abc": 789456,
    "def": "PRODUCT"
  }
],
"id": 2,
"title": "queen"
}
]'

Using above JSON, I want to create one dataframe that expands the props list and concats to main json columns.

In the end end I want to end up with these columns in df:

id,title,name,abc,def

With rows:

1,king,t1,10012,OBJECT

1,king,t1,999123,SUBJECT

2,queen,t2,789456,PRODUCT

When I try this:

jdata = json.loads(json_str)
pd.concat([pd.DataFrame(jdata), pd.DataFrame(list(jdata['props']))], axis=1).drop('props', 1)

I get this error:

list indices must be integers or slices, not str

Also tried this:

jdata=json.loads(json_str)
pd.concat([pd.DataFrame(jdata), pd.DataFrame([pd.json_normalize(jdata, "props", errors="ignore", record_prefix="")])], axis=1).drop('props', 1)

throws this error:

Must pass 2-d input. shape={values.shape}

Also tried this:

result = pd.json_normalize(jdata, 'props', errors="ignore", record_prefix="props.")
result2 = pd.json_normalize(jdata, errors="ignore", record_prefix="tmpl.")
df = pd.concat([result, result2], axis=1).drop('props', 1)

No error thrown here, but the concat doesn't line up the two df's. The rows are out of sync.

Thanks for any help.

2
  • is it a string or json data? Commented Apr 21, 2021 at 1:02
  • 1
    it's json data after json.loads(json_str) Commented Apr 21, 2021 at 1:03

2 Answers 2

3

You could use json_normalize to simplify the extraction; for each record_path there will be an associated meta:

json_normalize(data = jdata, 
               record_path = 'props', 
               meta = ['name', 'id', 'title']
              )
 
      abc      def name id  title
0   10012   OBJECT   t1  1   king
1  999123  SUBJECT   t1  1   king
2  789456  PRODUCT   t2  2  queen
Sign up to request clarification or add additional context in comments.

3 Comments

Much better than my (now deleted) answer! +1
I like you deleted answer and it works perfectly!! The thing with this one is again it refers to property names. If I could upvote deleted answers, I would :)
yes, @sacuL, I think you should undelete it, since that is what the OP prefers
1

I think that pd.json_normalize is probably the way to go, with a couple minor tweaks: first explode the props column to get one row per value in the array, and then use apply(pd.Series) to turn the dictionaries into their own columns:

# I think you already did this, but start by turning the str into proper json
>>> jdata = json.loads(json_str)
>>> result = pd.json_normalize(jdata).explode("props")   
>>> result[["abc", "def"]] = result.props.apply(pd.Series) 
>>> df = result[["id", "title", "name", "abc", "def"]]

>>> df

   id  title name     abc      def
0   1   king   t1   10012   OBJECT
0   1   king   t1  999123  SUBJECT
1   2  queen   t2  789456  PRODUCT
                                         

Edit: As per your comment, you can change things around a bit to make it work without having to explicitly refer to the columns, except for props:

>>> jdata = json.loads(json_str)
>>> result = pd.json_normalize(jdata).explode("props")   
>>> result2 = result.pop("props").apply(pd.Series)
>>> df = pd.concat([result, result2], axis=1)

  name  id  title     abc      def
0   t1   1   king   10012   OBJECT
0   t1   1   king  999123  SUBJECT
1   t2   2  queen  789456  PRODUCT

2 Comments

Thanks for your reply, much appreciated! This is good, but is there a way to do this without referring to attribute names, except may the "props" attribute?
@A.G. I tried to do that in an edit, is that along the lines of what you were thinking?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.