Using Python + lxml (xpath) to scrape/extract text from a website and print it

Question

I am a new python learner; almost 3 weeks old.

I am trying to automate some daily tasks by using python. In here, I was trying to scrape a website which is "https://www.germaneveryday.com/", It does generate a new German word every day along with a sentence example. So my plan was to automate this instead of visiting the site everyday.

I followed an online tutorial from here : http://docs.python-guide.org/en/latest/scenarios/scrape/

And this is the code:

from lxml import html
import requests

page = requests.get('https://www.germaneveryday.com/')
tree = html.fromstring(page.content)

Word = tree.xpath('//*[@id="main"]/div[1]/div[2]/div/h1/a')


print (Word)

I did inspect the daily word on the website, and using right click, copy xpath to extract the "tree.xpath" address for the specific html data I am willing to get out and print in my simple code using lxml + python.

Except that every time the output is either an empty parenthesis such as : [] or it is some html block that is meaningless As shown here : https://i.sstatic.net/dAjB6.png

My question is that, what is wrong here is it the xpath address or maybe the website has some kind of a layer over the html ?

(Excuse my ignorance using some descriptions such as : layer or address of xpath )

My System Info:

Windows 7 (x86)
Python Version is (v3.6.5)
Web Browser is Chrome 66.0.3359.181

Davide Fiocco · Accepted Answer · 2018-06-02 14:21:43Z

3

Use list index to access the required element and .text to print its text.

Ex:

from lxml import html
import requests

page = requests.get('https://www.germaneveryday.com/')
tree = html.fromstring(page.content)
Word = tree.xpath('//*[@id="main"]/div[1]/div[2]/div/h1/a')[0].text
print (Word)

Output:

heimlich

edited Jun 2, 2018 at 14:21

Davide Fiocco

6,0395 gold badges43 silver badges79 bronze badges

answered Jun 2, 2018 at 14:19

Rakesh

82.9k17 gold badges85 silver badges122 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Devratna · Accepted Answer · 2018-06-02 14:58:40Z

0

Try this code, It should work

from lxml import html
import requests

page = requests.get('http://www.germaneveryday.com/')
tree = html.fromstring(page.content)

word = tree.xpath('//*[@id="main"]/div[1]/div[2]/div/h1/a/text()')


print (word)

answered Jun 2, 2018 at 14:58

Devratna

1,0081 gold badge8 silver badges29 bronze badges

Comments

MN93 · Accepted Answer · 2018-06-02 15:22:38Z

0

The problem was as said above by: Rakesh, Davide Fiocco and Devratna

"Use list index to access the required element and .text to print its text"

using the code

from lxml import html
import requests

page = requests.get('https://www.germaneveryday.com/')
tree = html.fromstring(page.content)
Word = tree.xpath('//*[@id="main"]/div[1]/div[2]/div/h1/a')[0].text
print (Word)

It is working now !

answered Jun 2, 2018 at 15:22

MN93

131 silver badge3 bronze badges

Collectives™ on Stack Overflow

Using Python + lxml (xpath) to scrape/extract text from a website and print it

3 Answers 3

Comments

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related