0
from lxml import html
import requests
url = 'https://www.bloomberg.com/quote/SPX:IND'
page = requests.get(url)
tree = html.fromstring(page.content)
num = tree.xpath('//*[@id="root"]/div/div/section[2]/div[1]/div/section[1]/section/section[2]/section/div[1]/span[1]/text()')
print (num)

this is the code I have written. I'm trying to get the string 2758.82,from this. but what I get is.

[]

I copied the xpath for that section from the website. I have seen similar questions here, but they didn't help. Is something wrong with my code?

4
  • If you still didn't get the number you wised to parsed, there is one more thing you need to do other than what @Arount has already suggested. You need to define a header like requests.get(url,headers={"User-Agent":"Mozilla/5.0"}) to make your scraper more like a human. Commented Jul 8, 2018 at 13:07
  • Thanks!! It's working now. Commented Jul 8, 2018 at 13:20
  • one more thing..how do I access something like <div style="display:inline" data-dobid="dfn"><span>some text.</span></div> and what if <span> has some attributes too? Commented Jul 8, 2018 at 13:33
  • If you wanna play with the visible tags, try using selenium which will let you parse whatever items you want to grab considering their visible form. Commented Jul 8, 2018 at 13:39

1 Answer 1

2

It's not about the xpath. It's about how the page is generated.

If you check the content of page.content you will see there is no <div id="root" [..]> in the webpage's source. It's because the HTML content is mainly generated via Javascript.

But this is not something that should stop you, if you open the raw html source (from page.content) and look for the value you want (2759.81), you will find a tag: <meta itemprop="price" content="2759.82" /> and another <div class="price">2759.81</div>, you can use one of them:

print(tree.xpath('//*[@itemprop="price"]')[0].get('content'))

Gives

2759.82
Sign up to request clarification or add additional context in comments.

1 Comment

Thanks!!.. and what do you mean by from page.content? should I look for <meta itemprop="price" content="2759.82" /> in actual page source? because when I print page.content I get some unaligned HTML text and I can't find <meta itemprop="price" content="2759.82" /> there....also...when I try to execute the code you suggested....I get IndexError: list index out of range

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.