0

in python i'm reading an html page content which contains a lot of stuff. To do this i read the webpage as string by this way:

url = 'https://myurl.com/'
reqq = req.Request(url, headers={'User-Agent': 'Mozilla/5.0'})
reddit_file = req.urlopen(reqq)
reddit_data = reddit_file.read().decode('utf-8')

if i print the reddit_data i can see correctly the whole html contents. Now, inside it there's a structure like json that i would like to read and extract some fields from that.

Below the structure:

"dealDetails" : {
      "f240141a" : {
         "egressUrl" : "https://ccc.com",
         "title" : "ZZZ",
         "type" : "ghi",
      },
      "5f9ab246" : {
         "egressUrl" : "https://www.bbb.com/",
         "title" : "YYY",
         "type" : "def",
      },
      "2bf6723b" : {
         "egressUrl" : "https://www.aaa.com//",
         "title" : "XXX",
         "type" : "abc",
      },
}

What i want to do is: find the dealDetails field and then for each f240141a 5f9ab246 2bf6723b get the egressURL, title and type values.

Thanks

1
  • Can you post the full script tag? Commented Oct 15, 2019 at 7:41

2 Answers 2

3

Try this,

[nested_dict['egressUrl'] for nested_dict in reddit_data['dealDetails'].keys()]

To access the values of JSON, you can consider as dictionary and use the same syntax to access values as well.

Edit-1:

Make sure your type of reddit_data is a dictionary.

if type(reddit_data) is str.

You need to do..

import ast
reddit_data = ast.literal_eval(reddit_data)

OR

import json
reddit_data = json.loads(reddit_data)
Sign up to request clarification or add additional context in comments.

2 Comments

i tried your suggestion but i get this error: [nested_dict['egressUrl'] for nested_dict in reddit_data['dealDetails'].keys()] TypeError: string indices must be integers
@xXJohnRamboXx Read your json data using json.loads(your json data) or ast.literal_eval(your json data)
0
  • If you just wanted to know how to access the egressURL, title and the type. You might just wanna read the answer below! Be careful however, cause the following code won't work unless you converted your HTML file reddit_data in something like a dictionary ( Modified shaik moeed's answer a tiny bit to also return title and type) :
[(i['egressUrl'], i['title'], i['type']) for i in reddit_data['dealDetails'].keys()]
  • However, If I got it right, the part you're missing is the conversion from HTML to a JSON friendly file. What I personally use, even though it's quite unpopular, is the eval function
dictionary = eval(reddit_data)

This will convert the whole file into a dictionary, I recommend that you only use it on the part of the text that 'looks' like a dictionary! (One of the reason eval is unpopular, is because it won't convert strings like 'true'/'false' to Python's True/False, be careful with that :) )

Hope that helped!

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.