With Python 3.x, I'm trying to get a list of values that are in what looks to be a JSON variable.
Here's some of the HTML:
<script type="text/javascript">
var BandData = {
id: 171185318,
name: "MASS",
fan_email: null,
account_id: 365569831,
has_discounts: null,
image_id: 39000212
};
var EmbedData = {
tralbum_param: { name: "album", value: 28473799 },
show_campaign: null,
embed_info: {"exclusive_embeddable":null,"public_embeddable":"01 Dec 2011 06:09:19 GMT","no_track_preorder":false,"item_public":true}
};
var FanData = {
logged_in: false,
name: null,
image_id: null,
ip_country_code: null
};
var TralbumData = {
current: {"require_email_0":1,"new_date":"18 Jan 2017 22:59:06 GMT"},
is_preorder: null,
album_is_preorder: null,
album_release_date: "01 Dec 2017 00:00:00 GMT",
preorder_count: null,
hasAudio: true,
art_id: 3862222,
trackinfo: [{"video_featured":null,"has_lyrics":false,"file":{"mp3-128":"https://t4.bcbits.com/stream/064bc3d8bb5/mp3-128/35322674"},"is_capped":null,"sizeof_lyrics":0,"duration":143.244,"encodings_id":830008708},{"video_featured":null,"has_lyrics":false,"license_type":0}],
playing_from: "album page",
featured_track_id: 8612194,
};
Specifically, within TralbumData, I'm trying to get the URLs within mp3-128 within trackinfo.
It's tricky for me. It looks like JSON data, but I can't quite get that to work.
So far, I'm able to at least isolate trackinfo in the TralbumData variable, with a really kludgy function, but can't quite get it from there. Here's what I have to try and find trackinfo and then get the URLs within...:
def get_HTML(url):
response = urllib.request.urlopen(url)
page_source = response.read()
site_html = page_source.decode('utf8')
response.close()
JSON = re.compile('TralbumData = ({.*?});', re.DOTALL)
matches = JSON.search(site_html)
info = matches.group(1)
# print(info)
data = info.split("\n")
return data
def get_trackinfo(data):
# print(data[11])
for entry in data:
tmp = entry.split(":")
if tmp[0].strip() == "trackinfo":
for ent in tmp:
tmp = ent.split("mp3-128")
print(tmp)
Doesn't work since it's splitting with :, effectively separating the http:// part.
I'd think there's a way (and I've looked around and the answers to similar questions here on SO get close, but not quite there), to do say url = my_html['TralbumData']['trackinfo']['mp3-128'] or something.
trackinfo:) as JSON and extract the thing you want from the Python list?trackerinfoin it, split on:withstr.partition(), strip, decode.datain to lines, withdata.splitlines(), I can't because the type is incorrect. Mydatais a list. I've edited my OP to show you how I'm getting the HTML currently (get_HTML). I've also found that inget_trackinfo(data), if I doprint(data[11]), I correctly get data startingtrackinfo: [{"video_featured":null, ...) but still am struggling with how to parse that result...Thanks for your continued help thoughinfo.split("\n").