0

I have a big nested, then nested then nested json file saved as .txt format. I need to access some specific key pairs and crate a data frame or another transformed json object for further use. Here is a small sample with 2 key pairs.

[
  {
"ko_id": [819752],
"concepts": [
  {
    "id": ["11A71731B880:http://ontology.intranet.com/Taxonomy/116@en"],
    "uri": ["http://ontology.intranet.com/Taxonomy/116"],
    "language": ["en"],
    "prefLabel": ["Client coverage & relationship management"]
  }
]
  },
  {
"ko_id": [819753],
"concepts": [
  {
    "id": ["11A71731B880:http://ontology.intranet.com/Taxonomy/116@en"],
    "uri": ["http://ontology.intranet.com/Taxonomy/116"],
    "language": ["en"],
    "prefLabel": ["Client coverage & relationship management"]
     }
   ]
 }
]

The following code load the data as list but I need to access to the data probably as a dictionary and I need the "ko_id", "uri" and "prefLabel" from each key pair and put it to a pandas data frame or a dictionary for further analysis.

with open('sample_data.txt') as data_file:    
   json_sample = js.load(data_file)

The following code gives me the exact value of the first element. But donot actually know how to put it together and build the ultimate algorithm to create the dataframe.

print(sample_dict["ko_id"][0])
print(sample_dict["concepts"][0]["prefLabel"][0])
print(sample_dict["concepts"][0]["uri"][0])

2 Answers 2

2
for record in sample_dict:
    df = pd.DataFrame(record['concepts']) 
    df['ko_id'] = record['ko_id']
    final_df = final_df.append(df)
Sign up to request clarification or add additional context in comments.

Comments

2

You can pass the data to pandas.DataFrame using a generator:

import pandas as pd
import json as js

with open('sample_data.txt') as data_file:    
   json_sample = js.load(data_file)

df = pd.DataFrame(data = ((key["ko_id"][0],
                           key["concepts"][0]["prefLabel"][0],
                           key["concepts"][0]["uri"][0]) for key in json_sample),  
                  columns = ("ko_id", "prefLabel", "uri"))

Output:

>>> df

    ko_id                                  prefLabel                                        uri
0  819752  Client coverage & relationship management  http://ontology.intranet.com/Taxonomy/116   
1  819753  Client coverage & relationship management  http://ontology.intranet.com/Taxonomy/116 

5 Comments

@FJ may be there is some problem in the "uri" wthen i run the following code it gives me error in the main data. for key in data_dict: print(key["concepts"][0]["uri"][0]) It shows list index out of range. I mean probably there is some missing/empty field in the main data.
@DataPsycho data_dict is the json (json_sample in my code)? What is the exception?
@FJ oh sorry data_dict is the full version of json_sample. So it has same structure but with the full data.
I lode the big file and then run your code it gives me following error. with open('output_json_20171031.json') as data_file: data_dict = js.load(data_file)
@DataPsycho contents or uri are empty in some fields, you can try list slicing: df = pd.DataFrame(data = ((*key["ko_id"][0:1], *key["concepts"][0]["prefLabel"][0:1], *key["concepts"][0]["uri"][0:1]) for key in json_sample), columns = ("ko_id", "prefLabel", "uri")) Would it be possible for you to share the entire file using Google Drive, DropBox, etc? It would be much easier to give you a possible solution.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.