0

I have a JSON file which resulted from YouTube's iframe API and I want to put this JSON data into a pandas dataframe, where each JSON key will be a column, and each record should be a new row.

Normally I would use a loop and iterate over the rows of the JSON but this particular JSON looks like this :

[  
   "{\"timemillis\":1563467467703,\"date\":\"18.7.2019\",\"time\":\"18:31:07,703\",\"videoId\":\"0HJx2JhQKQk\",\"startSecond\":\"0\",\"stopSecond\":\"90\",\"playerStateNumeric\":1,\"playerStateVerbose\":\"Playing\",\"curTimeFormatted\":\"0:02\",\"totalTimeFormatted\":\"9:46\",\"playoutLevelPercent\":0.3,\"bufferLevelPercent\":1.4,\"qual\":\"large\",\"qualLevels\":[\"hd720\",\"large\",\"medium\",\"small\",\"tiny\",\"auto\"],\"playbackRate\":1,\"playbackRates\":[0.25,0.5,0.75,1,1.25,1.5,1.75,2],\"playerErrorNumeric\":\"\",\"playerErrorVerbose\":\"\"}",
   "{\"timemillis\":1563467468705,\"date\":\"18.7.2019\",\"time\":\"18:31:08,705\",\"videoId\":\"0HJx2JhQKQk\",\"startSecond\":\"0\",\"stopSecond\":\"90\",\"playerStateNumeric\":1,\"playerStateVerbose\":\"Playing\",\"curTimeFormatted\":\"0:03\",\"totalTimeFormatted\":\"9:46\",\"playoutLevelPercent\":0.5,\"bufferLevelPercent\":1.4,\"qual\":\"large\",\"qualLevels\":[\"hd720\",\"large\",\"medium\",\"small\",\"tiny\",\"auto\"],\"playbackRate\":1,\"playbackRates\":[0.25,0.5,0.75,1,1.25,1.5,1.75,2],\"playerErrorNumeric\":\"\",\"playerErrorVerbose\":\"\"}"

]

In this JSON not every key is written as a new line. How can I extract the keys in this case, and express them as columns?

3 Answers 3

1

A Pythonic Solution would be to use the keys and values API of the Python Dictionary.

it should be something like this:

ls = [
   "{\"timemillis\":1563467467703,\"date\":\"18.7.2019\",\"time\":\"18:31:07,703\",\"videoId\":\"0HJx2JhQKQk\",\"startSecond\":\"0\",\"stopSecond\":\"90\",\"playerStateNumeric\":1,\"playerStateVerbose\":\"Playing\",\"curTimeFormatted\":\"0:02\",\"totalTimeFormatted\":\"9:46\",\"playoutLevelPercent\":0.3,\"bufferLevelPercent\":1.4,\"qual\":\"large\",\"qualLevels\":[\"hd720\",\"large\",\"medium\",\"small\",\"tiny\",\"auto\"],\"playbackRate\":1,\"playbackRates\":[0.25,0.5,0.75,1,1.25,1.5,1.75,2],\"playerErrorNumeric\":\"\",\"playerErrorVerbose\":\"\"}",
   "{\"timemillis\":1563467468705,\"date\":\"18.7.2019\",\"time\":\"18:31:08,705\",\"videoId\":\"0HJx2JhQKQk\",\"startSecond\":\"0\",\"stopSecond\":\"90\",\"playerStateNumeric\":1,\"playerStateVerbose\":\"Playing\",\"curTimeFormatted\":\"0:03\",\"totalTimeFormatted\":\"9:46\",\"playoutLevelPercent\":0.5,\"bufferLevelPercent\":1.4,\"qual\":\"large\",\"qualLevels\":[\"hd720\",\"large\",\"medium\",\"small\",\"tiny\",\"auto\"],\"playbackRate\":1,\"playbackRates\":[0.25,0.5,0.75,1,1.25,1.5,1.75,2],\"playerErrorNumeric\":\"\",\"playerErrorVerbose\":\"\"}"

]
ls = [json.loads(j) for j in ls]

keys = [j.keys() for j in ls] # this will get you all the keys
vals = [j.values() for j in ls] # this will get the values and then you can do something with it 

print(keys)
print(values)
Sign up to request clarification or add additional context in comments.

Comments

1

easiest way is to leverage json_normalize from pandas.

import json
from pandas.io.json import json_normalize

input_dict = [
   "{\"timemillis\":1563467467703,\"date\":\"18.7.2019\",\"time\":\"18:31:07,703\",\"videoId\":\"0HJx2JhQKQk\",\"startSecond\":\"0\",\"stopSecond\":\"90\",\"playerStateNumeric\":1,\"playerStateVerbose\":\"Playing\",\"curTimeFormatted\":\"0:02\",\"totalTimeFormatted\":\"9:46\",\"playoutLevelPercent\":0.3,\"bufferLevelPercent\":1.4,\"qual\":\"large\",\"qualLevels\":[\"hd720\",\"large\",\"medium\",\"small\",\"tiny\",\"auto\"],\"playbackRate\":1,\"playbackRates\":[0.25,0.5,0.75,1,1.25,1.5,1.75,2],\"playerErrorNumeric\":\"\",\"playerErrorVerbose\":\"\"}",
   "{\"timemillis\":1563467468705,\"date\":\"18.7.2019\",\"time\":\"18:31:08,705\",\"videoId\":\"0HJx2JhQKQk\",\"startSecond\":\"0\",\"stopSecond\":\"90\",\"playerStateNumeric\":1,\"playerStateVerbose\":\"Playing\",\"curTimeFormatted\":\"0:03\",\"totalTimeFormatted\":\"9:46\",\"playoutLevelPercent\":0.5,\"bufferLevelPercent\":1.4,\"qual\":\"large\",\"qualLevels\":[\"hd720\",\"large\",\"medium\",\"small\",\"tiny\",\"auto\"],\"playbackRate\":1,\"playbackRates\":[0.25,0.5,0.75,1,1.25,1.5,1.75,2],\"playerErrorNumeric\":\"\",\"playerErrorVerbose\":\"\"}"

]
input_json = [json.loads(j) for j in input_dict]

df = json_normalize(input_json)

Comments

0

I think you are asking to break down your key and values and want keys as a column,and values as a row:
This is my approach and plz always provide how your expected output should like

ChainMap flats your dict in key and values and pretty much is self explanatory.

data = ["{\"timemillis\":1563467467703,\"date\":\"18.7.2019\",\"time\":\"18:31:07,703\",\"videoId\":\"0HJx2JhQKQk\",\"startSecond\":\"0\",\"stopSecond\":\"90\",\"playerStateNumeric\":1,\"playerStateVerbose\":\"Playing\",\"curTimeFormatted\":\"0:02\",\"totalTimeFormatted\":\"9:46\",\"playoutLevelPercent\":0.3,\"bufferLevelPercent\":1.4,\"qual\":\"large\",\"qualLevels\":[\"hd720\",\"large\",\"medium\",\"small\",\"tiny\",\"auto\"],\"playbackRate\":1,\"playbackRates\":[0.25,0.5,0.75,1,1.25,1.5,1.75,2],\"playerErrorNumeric\":\"\",\"playerErrorVerbose\":\"\"}","{\"timemillis\":1563467468705,\"date\":\"18.7.2019\",\"time\":\"18:31:08,705\",\"videoId\":\"0HJx2JhQKQk\",\"startSecond\":\"0\",\"stopSecond\":\"90\",\"playerStateNumeric\":1,\"playerStateVerbose\":\"Playing\",\"curTimeFormatted\":\"0:03\",\"totalTimeFormatted\":\"9:46\",\"playoutLevelPercent\":0.5,\"bufferLevelPercent\":1.4,\"qual\":\"large\",\"qualLevels\":[\"hd720\",\"large\",\"medium\",\"small\",\"tiny\",\"auto\"],\"playbackRate\":1,\"playbackRates\":[0.25,0.5,0.75,1,1.25,1.5,1.75,2],\"playerErrorNumeric\":\"\",\"playerErrorVerbose\":\"\"}"]

import json
from collections import ChainMap

data = [json.loads(i) for i in data]
data = dict(ChainMap(*data))

keys = []
vals = []

for k,v in data.items():
    keys.append(k)
    vals.append(v)

data = pd.DataFrame(zip(keys,vals)).T
new_header = data.iloc[0]
data = data[1:]
data.columns = new_header


#startSecond    playbackRates   playbackRate    qual    totalTimeFormatted  timemillis  playerStateNumeric  playerStateVerbose  playerErrorNumeric  date    time    stopSecond  bufferLevelPercent  playerErrorVerbose  qualLevels  videoId curTimeFormatted    playoutLevelPercent
  #0             [0.25, 0.5, 0.75, 1, 1.25, 1.5, 1.75, 2]   1   large   9:46    1563467467703   1   Playing     18.7.2019   18:31:07,703    90  1.4     [hd720, large, medium, small, tiny, auto]   0HJx2JhQKQk 0:02    0.3

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.