0

I have a data frame with multiple columns that I would like to convert to a .json file. The structure of the .json file should be as such: I want to use one column as an 'identifier' column, where the values serve as keys for a dictionary. All values in this column are unique. All other columns should be represented as key-value-mappings for each unique value of the identifier column in the order of appearance. I am also looking for a function to reproduce the data frame based on this .json file. Here's an example code that produces a dummy data frame:

import numpy as np
import pandas as pd

data_dictionary = {'col_1':[np.nan,np.nan,np.nan,np.nan],
                   'col_2':[np.nan,1,np.nan,1],
                   'col_3':['a','b','c','d'],
                   'col_4':['description of a','description of b','description of c','description of d']}

df = pd.DataFrame(data_dictionary)

which gives:

   col_1  col_2 col_3             col_4
0    NaN    NaN     a  description of a
1    NaN    1.0     b  description of b
2    NaN    NaN     c  description of c
3    NaN    1.0     d  description of d

And this is how the .json file should look like (using col_3 as identifier column):

{
  "col_3": {
    "a": {
      "col_1": null,
      "col_2": null,
      "col_4": "description of a"
    },
    "b": {
      "col_1": null,
      "col_2": 1,
      "col_4": "description of b"
    },
    "c": {
      "col_1": null,
      "col_2": null,
      "col_4": "description of c"
    },
    "d": {
      "col_1": null,
      "col_2": 1,
      "col_4": "description of d"
    }
  }
}
1
  • 2
    df.set_index('col_3').to_json(orient='index') almost solve your problem. Commented Nov 18, 2020 at 15:24

1 Answer 1

1

let me try something:

import json
dict_result = df.set_index('col_3').to_json(orient='index')
final = {'col_3':json.loads(dict_result)}
print(final)

>>>{'col_3': 
     {'a': 
        {
         'col_1': None,
         'col_2': None,
         'col_4': 'description of a'
        }, 
      'b': 
        {
         'col_1': None, 
         'col_2': 1.0, 
         'col_4': 'description of b'
        }, 
      'c': 
        { 
         'col_1': None, 
         'col_2': None,
         'col_4': 'description of c'
        }, 
     'd': 
        {
         'col_1': None,
         'col_2': 1.0,
         'col_4': 'description of d'
 }}}
Sign up to request clarification or add additional context in comments.

4 Comments

That's pretty close. The only thing I am missing when I save the .json file to disk is the 'col_3' key? So df.set_index('col_3').to_json('test.json',orient='index') gives me a single dict, with 'a','b','c','d' as keys.
you skipped this final = {'col_3':json.loads(dict_result)} ...
Ah, the only thing that was missing to also save the .json file, was with open('data.json', 'w') as fp: json.dump(final, fp)
Ok, happy you found a way!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.