1

I have an array of nested dictionary:

data = {"A":"a","B":"b","ID":[{"ii":"ABC","jj":"BCD"},{"ii":"AAC","jj":"FFD"}],"Finish":"yes"}

I used,

res = pd.DataFrame.from_dict(data , orient='index')

But the ID is still returned as list of dictionary.

A  B      ID                                              Finish
a  b  [{"ii":"ABC","jj":"BCD"},{"aa":"AAC","bb":"FFD"}]    yes

But I want everything to be converted to df. Not sure how to do it. Kindly help.

Expected Output:

A  B  ID.ii  ID.jj   Finish
a  b   ABC    BCD      yes
a  b   AAC    FFD      yes
2
  • Are you sure it shouldn't be- data = {"A":"a","B":"b","ID":[{"ii":"ABC","jj":"BCD"},{"ii":"AAC","jj":"FFD"}],"Finish":"yes"} ? Commented May 5, 2021 at 8:02
  • Made the change, thanks. Commented May 5, 2021 at 8:04

2 Answers 2

1

You can achieve this using pandas json_normalize

df = pd.json_normalize(data, meta=['A', 'B'], record_path=['ID'], record_prefix="ID.")

Output

  ID.ii ID.jj  A  B
0   ABC   BCD  a  b
1   AAC   FFD  a  b

record_path - will be used to flatten the specific key record_prefix - is added as a column prefix meta - is the columns that needs to be preserved without flattening

Refer the documentation for examples

Sign up to request clarification or add additional context in comments.

Comments

0

To achieve this without using json_normalize, you can pre-process the input like this-

data = {"A":"a","B":"b","ID":[{"ii":"ABC","jj":"BCD"},{"ii":"AAC","jj":"FFD"}],"Finish":"yes"}
op = {}

for i in data:
    if isinstance(data[i], list):
        for j in data[i]:
            for k in j:
                tmp = str(i)+"."+str(k)
                if tmp not in op:
                    op[tmp] = [j[k]]
                else:
                    op[tmp].append(j[k])
    else:
        op[i] = data[i]

        
>>> data
{'A': 'a', 'B': 'b', 'ID': [{'ii': 'ABC', 'jj': 'BCD'}, {'ii': 'AAC', 'jj': 'FFD'}], 'Finish': 'yes'}
>>> op
{'A': 'a', 'B': 'b', 'ID.ii': ['ABC', 'AAC'], 'ID.jj': ['BCD', 'FFD'], 'Finish': 'yes'}

After this you can directly use

>>> pd.DataFrame(op)

   A  B ID.ii ID.jj Finish
0  a  b   ABC   BCD    yes
1  a  b   AAC   FFD    yes

2 Comments

Where do I define op in the script?
Oh, sorry I forgot to add it. (Was using terminal) ;) . Edited now

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.