3

How can I simply separate a JSON column inside pandas:

pd.DataFrame({
    'col1':[1,2], 
    'col2':["{'foo':1, 'bar':2, 'baz':{'foo':2, 'x':1}}",
            "{'foo':3, 'bar':5, 'baz':{'foo':2, 'x':1}}"]})

   col1                                        col2
0     1  {'foo':1, 'bar':2, 'baz':{'foo':2, 'x':1}}
1     2  {'foo':3, 'bar':5, 'baz':{'foo':2, 'x':1}}

into real columns in a simple and python way?

edit

Desired output:

pd.DataFrame({'col1':[1,2], 'foo':[1,3], 'bar':[2,5], 
              'baz_foo':[2,2], 'baz_x':[1,1]})

   col1  foo  bar  baz_foo  baz_x
0     1    1    2        2      1
1     2    3    5        2      1
4
  • Is the inconsistent quoting in your JSON-like col2 actually what you are looking to parse? Instantiating the DataFrame you provide works, but taking the next step using ast.literal_eval doesn't work because that's not a valid dictionary. Haven't tried the json library actually... Commented Dec 3, 2018 at 21:07
  • no. updated the data. Commented Dec 3, 2018 at 21:08
  • 1
    Can you include your desired output as well, just to be clear? Commented Dec 3, 2018 at 21:10
  • added it to the question. Commented Dec 3, 2018 at 21:12

2 Answers 2

5

json_normalize is the right way to tackle nested JSON data.

import ast
from pandas.io.json import json_normalize

v = json_normalize([ast.literal_eval(j) for j in df.pop('col2')], sep='_')
pd.concat([df, v], 1)

   col1  bar  baz_foo  baz_x  foo
0     1    2        2      1    1
1     2    5        2      1    3

Note, you will still have to convert the JSON to a dictionary first.


If you want to handle NaNs in "col2", try using join at the end:

df = pd.DataFrame({
    'col1':[1,2,3], 
    'col2':["{'foo':1, 'bar':2, 'baz':{'foo':2, 'x':1}}",
            "{'foo':3, 'bar':5, 'baz':{'foo':2, 'x':1}}", 
            np.nan]})

v = json_normalize([
    ast.literal_eval(j) for j in df['col2'].dropna()], sep='_'
)
v.index = df.index[df.pop('col2').notna()]

df.join(v, how='left')
   col1  bar  baz_foo  baz_x  foo
0     1  2.0      2.0    1.0  1.0
1     2  5.0      2.0    1.0  3.0
2     3  NaN      NaN    NaN  NaN
Sign up to request clarification or add additional context in comments.

6 Comments

I get a 'ValueError: malformed node or string: <_ast.Name object at 0x1031d1278> ' for your code.
@GeorgHeiler That's because your JSON is malformed. What do you want to do about it?
I mentioned above that this bug was not intentional. Valid JSON should be assumed and fixed the dummy data.
@GeorgHeiler Made a slight edit. Try running my code with the initialisation provided in my post?
Indeed. This is great. However, this does not yet handle None. Would you mind adding it? I know it was lacking in the minimal sample, but would be great for a more complete sample.
|
0

json_normalize changes nested json-like dictionaries into a table. The nesting path is used to create the column names.

import pandas as pd
from pandas.io.json import json_normalize

data = {'col1':[1,2,3], 
        'col2':[{'foo': 1, 'bar': 2, 'baz': {'foo': 2, 'x': 1}},
                {'foo': 3, 'bar': 5, 'baz': {'foo': None, 'x': 1}}]}

pd.DataFrame(data={"col1": data["col1"]})\
  .join(json_normalize(data["col2"]))

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.