0

I loaded a .csv file into a df, and one of the row of a columns contains a list of dictionary like below.

data = [{"character": "Jake Sully", "gender": 2,}, {"character": "Neytiri", "gender": 1},                                                         
        {"character": "Dr. Grace Augustine","gender": 1},         
        {"character": "Col. Quaritch", "gender": 2]

But of course after loading it, it's read as a string. So, I converted each row in the column to a json, which makes it easy to extract values based on the key name. I then need to create a seperate df like so.

df = {'character': ['Jake Sully','Neytiri', 'Dr. Grace Augustine', 'Col.Quaritch'], 
    'gender': [2, 1, 1, 2]} 

This is my code but I can't quite get the desired df ouput right.

df = pd.DataFrame() #create new df
keys = ['character','gender'] #keys to extract values from json
lst=[]
for val in data: #to iterate over data series
    for object in json.loads(val):
        for key in keys:
            lst.append(object[key])
    df = pd.concat([df,pd.DataFrame(lst,columns=[key])], axis=1)

Can someone tell me what i am doing wrong?

1
  • Related? Commented Jun 6, 2018 at 13:31

3 Answers 3

2

pd.DataFrame accepts a list of dictionaries directly:

data = [{"character": "Jake Sully", "gender": 2,},
        {"character": "Neytiri", "gender": 1},
        {"character": "Dr. Grace Augustine","gender": 1},
        {"character": "Col. Quaritch", "gender": 2}]

df = pd.DataFrame(data)  # or pd.DataFrame.from_dict(data)

print(df)

             character  gender
0           Jake Sully       2
1              Neytiri       1
2  Dr. Grace Augustine       1
3        Col. Quaritch       2

Therefore, you only need to extract a list of dictionaries from your json file. One way you can do this is via json.loads.

A better idea is to read your data directly into a dataframe via pd.read_json.

Sign up to request clarification or add additional context in comments.

4 Comments

Thanks, but as I have mentioned, the file is a .csv file, which contains one column that is a list of dictionary. And after pd.read_csv('filename.csv'), I did use json.loads in the code above.
I did, but i get a ValueError: Expected object or value. That's why i loaded as pd.read_csv('filename.csv'), after which i iterate over each row in the column and then use json.loads(row_in_col_on_interst).
Sorry! typo. V is actually the iterator val in for val in data: have updated it.
@Zoozoo, You can try including pd.read_csv('filename.csv').to_dict() in your question.
0

I may be don't understand your question completely, but I am able to get df just fine.

data = [{"character": "Jake Sully", "gender": 2,}, 
         {"character": "Neytiri", "gender": 1},
         {"character": "Dr. Grace Augustine","gender": 1},
         {"character": "Col. Quaritch", "gender": 2}]

pd.DataFrame(data)

Out:

             character       gender
0           Jake Sully       2
1              Neytiri       1
2  Dr. Grace Augustine       1`

1 Comment

data represents only ONE row of the column of interest. It's to show that after i loaded the .csv file using pd.read_csv('filename.csv'), this column is loaded as string.
0

figured it out.

df = pd.DataFrame() #create new df
keys = ['character','gender'] #keys to extract values from json
for i,key in enumerate(keys):
     lst_i = []
     for row in data: #iterating over the rows in the cols of interest 
          for object in json.loads(row):
              lst_i.append(object[key])
     df = pd.concat([df,pd.DataFrame(lst_i,columns=[key])], axis=1)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.