I loaded a .csv file into a df, and one of the row of a columns contains a list of dictionary like below.
data = [{"character": "Jake Sully", "gender": 2,}, {"character": "Neytiri", "gender": 1},
{"character": "Dr. Grace Augustine","gender": 1},
{"character": "Col. Quaritch", "gender": 2]
But of course after loading it, it's read as a string. So, I converted each row in the column to a json, which makes it easy to extract values based on the key name. I then need to create a seperate df like so.
df = {'character': ['Jake Sully','Neytiri', 'Dr. Grace Augustine', 'Col.Quaritch'],
'gender': [2, 1, 1, 2]}
This is my code but I can't quite get the desired df ouput right.
df = pd.DataFrame() #create new df
keys = ['character','gender'] #keys to extract values from json
lst=[]
for val in data: #to iterate over data series
for object in json.loads(val):
for key in keys:
lst.append(object[key])
df = pd.concat([df,pd.DataFrame(lst,columns=[key])], axis=1)
Can someone tell me what i am doing wrong?