1

I have a json file from which I would like to extract the data-estimated-earnings attribute from the a element. The Attribute contains an object from which I would like to extract the open_eligible key value.

Here is the starting JSON:

{"html":"<div class='car_model_estimation_result__container'>\n<div class='car_model_estimation_result cobalt-mb-tight'>\n<div class='car_model_estimation_result__item'>\n<span class=\"car_model_estimation_result_amount\">720€</span>\n<p class='cobalt-text-sectionHeader'>\n<span>maximum estimés par mois</span>\n<span class='cobalt-mb-unit cobalt-Icon cobalt-Icon--size16 cobalt-Icon--colorGraphiteLight'>\n<a class=\"js_popup_trigger\" href=\"#estimate_about_with_open\"><svg viewBox=\"0 0 24 24\" xmlns=\"http://www.w3.org/2000/svg\">\n  <path d=\"M11 9h2V7h-2v2zm1 11c-4.41 0-8-3.59-8-8s3.59-8 8-8 8 3.59 8 8-3.59 8-8 8zm0-18C6.477 2 2 6.477 2 12A10 10 0 1 0 12 2zm-1 15h2v-6h-2v6z\" />\n</svg>\n\n</a></span>\n</p>\n\n</div>\n<div class='owner_homepage_hero_estimation_cta__container'>\n<a class=\"owner_homepage_hero_estimation_cta--fullWidth cobalt-Button cobalt-Button--primary cobalt-Button--large js_rent_my_car js_rent_my_car_top js_estimation_result\" rel=\"nofollow\" data-tracking-params=\"{&quot;model_id&quot;:&quot;1519&quot;,&quot;brand_id&quot;:&quot;67&quot;,&quot;mileage&quot;:4,&quot;city&quot;:&quot;Anvers&quot;,&quot;release_year&quot;:2016,&quot;open_eligible&quot;:true,&quot;currency&quot;:&quot;EUR&quot;,&quot;earnings&quot;:720,&quot;earnings_period&quot;:&quot;month&quot;}\" data-click-location=\"top\" data-estimated-earnings=\"{&quot;model_id&quot;:&quot;1519&quot;,&quot;release_year&quot;:2016,&quot;mileage&quot;:4,&quot;within_eligible_area&quot;:true,&quot;open_eligible&quot;:true}\" href=\"/choose_open_or_standard?mileage=4&amp;model_id=1519&amp;open_eligible=true&amp;release_year=2016&amp;within_eligible_area=true\">Inscrire ma voiture</a>\n</div>\n</div>\n</div>\n"}

Here is my python code for extracting what I need:

import json
from parsel import Selector

with open('C:/Users/coppe/Documents/py trials/estimated_earnings.json') as json_file:  
    earnings = json.load(json_file)
selector = Selector(earnings['html'])
eligibleObj = json.loads(json.dumps(selector.css('a::attr(data-estimated-earnings)').get()))
print(eligibleObj['open_eligible'])

The issue is that I get this error:

print(eligibleObj['open_eligible'])

TypeError: string indices must be integers

Does anyone know how to convert the object in the data-estimated-earnings attribute to an object and then extracting what I need ?

2
  • 3
    Why is there a json.dumps in there?! json.loads(json.dumps(a)) is the same as just a in the first place. Commented Jul 1, 2019 at 12:08
  • What is your expected value for eligibleObj['open_eligible']? Commented Jul 1, 2019 at 12:24

2 Answers 2

3

selector.css('a::attr(data-estimated-earnings)').get() returns a dictionary which is already in form of a string (json serialized), so you must not call json.dumps on it:

>>> import json
>>> from parsel import Selector
>>>
>>> earnings = {"html":"<div class='car_model_estimation_result__container'>\n<div class='car_model_estimation_result cobalt-mb-tight'>\n<div class='car_model_estimation_result__item'>\n<span class=\"car_model_estimation_result_amount\">720€</span>\n<p class='cobalt-text-sectionHeader'>\n<span>maximum estimés par mois</span>\n<span class='cobalt-mb-unit cobalt-Icon cobalt-Icon--size16 cobalt-Icon--colorGraphiteLight'>\n<a class=\"js_popup_trigger\" href=\"#estimate_about_with_open\"><svg viewBox=\"0 0 24 24\" xmlns=\"http://www.w3.org/2000/svg\">\n  <path d=\"M11 9h2V7h-2v2zm1 11c-4.41 0-8-3.59-8-8s3.59-8 8-8 8 3.59 8 8-3.59 8-8 8zm0-18C6.477 2 2 6.477 2 12A10 10 0 1 0 12 2zm-1 15h2v-6h-2v6z\" />\n</svg>\n\n</a></span>\n</p>\n\n</div>\n<div class='owner_homepage_hero_estimation_cta__container'>\n<a class=\"owner_homepage_hero_estimation_cta--fullWidth cobalt-Button cobalt-Button--primary cobalt-Button--large js_rent_my_car js_rent_my_car_top js_estimation_result\" rel=\"nofollow\" data-tracking-params=\"{&quot;model_id&quot;:&quot;1519&quot;,&quot;brand_id&quot;:&quot;67&quot;,&quot;mileage&quot;:4,&quot;city&quot;:&quot;Anvers&quot;,&quot;release_year&quot;:2016,&quot;open_eligible&quot;:true,&quot;currency&quot;:&quot;EUR&quot;,&quot;earnings&quot;:720,&quot;earnings_period&quot;:&quot;month&quot;}\" data-click-location=\"top\" data-estimated-earnings=\"{&quot;model_id&quot;:&quot;1519&quot;,&quot;release_year&quot;:2016,&quot;mileage&quot;:4,&quot;within_eligible_area&quot;:true,&quot;open_eligible&quot;:true}\" href=\"/choose_open_or_standard?mileage=4&amp;model_id=1519&amp;open_eligible=true&amp;release_year=2016&amp;within_eligible_area=true\">Inscrire ma voiture</a>\n</div>\n</div>\n</div>\n"}
>>>
>>> selector = Selector(earnings['html'])
>>> selector
<Selector xpath=None data='<html><body><div class="car_model_estima'>
>>>
>>> css = selector.css('a::attr(data-estimated-earnings)').get()
>>> type(css), css
(<class 'str'>, '{"model_id":"1519","release_year":2016,"mileage":4,"within_eligible_area":true,"open_eligible":true}')
>>>
>>> eligible_obj = json.loads(css)
>>> eligible_obj
{'model_id': '1519', 'release_year': 2016, 'mileage': 4, 'within_eligible_area': True, 'open_eligible': True}
>>> eligible_obj['open_eligible']
True

Translated to your code, it should be:

eligibleObj = json.loads(selector.css('a::attr(data-estimated-earnings)').get())

, but I'd say not to do too many operations in one line, as things might get confusing :) .

Sign up to request clarification or add additional context in comments.

Comments

0

You eligibleObj is a string, that looks like that:

'{"model_id":"1519","release_year":2016,"mileage":4,"within_eligible_area":true,"open_eligible":true}'

You have to do:

>>> print(json.loads(eligibleObj)['open_eligible'])
True

2 Comments

"TypeError: string indices must be integers". Where did you get the fact that it's a list from?
Indeed, sounds like str[1,5] or str["test"] error. Now fixed, thanks for remark.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.