2

I am getting a hardtime extracting the data First I need to extract the title post and the posted date of the post here's the url.

URL: https://cheddar.com/media/safety-concerns-over-teslas-autopilot-from-consumer-reports-as-wall-street-turns-bearish

Inside view-source there's a script in a json format that contains the data that I needed

Something like this, I crop the other text to minimize the space

<script>
      window.__RELAY_STORE__ = {"public_at":"2019-05-22T11:02:43- 
04:00","updated_at":"2019-05-22T15:25:20- 
04:00","thumbnail_attribution":null,"body":null,"title":"Safety Concerns 
Over Tesla's Autopilot from Consumer Reports as Wall Street Turns Bearish"
</script>

I just only need to get the "public_at" and the "title"

And What I have tried is this,

data = response.xpath("//script[contains(., 'window.__RELAY_STORE__')]/text()")
#Locate the script

datatxt = data.extract_first()
#Extract the script

start = datatxt.find('client:') - 2
end = datatxt.find('window.__REDUX_STATE__')
# find start and end of data 

json_string = datatxt[start:end]

but when I load it or convert it to python dictionary

 data = json.loads(json_string)

I've got an error something like this

Extra data: line 1 column 27284 (char 27283)

Any idea how can I get those data please?

3
  • The json string is not valid json data. Maybe post the output of print(json_string) to see what's wrong with it. Commented May 24, 2019 at 14:16
  • Yes that's why I am trying to make it a valid json string, and I've got this error "Extra data: line 1 column 27284 (char 27283)" Commented May 24, 2019 at 14:20
  • 1
    Look at the string and you'll see why it's not valid. Commented May 24, 2019 at 14:22

1 Answer 1

2

Try to get data in this way:

txt = response.xpath("//script[contains(., 'window.__RELAY_STORE__')]/text()").re_first('window.__RELAY_STORE__ = (.*);')

This will crop name of js-variable and last ;. So then when I call json.loads(txt) it gives me valid json.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.