1

I am a new python learner; almost 3 weeks old.

I am trying to automate some daily tasks by using python. In here, I was trying to scrape a website which is "https://www.germaneveryday.com/", It does generate a new German word every day along with a sentence example. So my plan was to automate this instead of visiting the site everyday.

I followed an online tutorial from here : http://docs.python-guide.org/en/latest/scenarios/scrape/

And this is the code:

from lxml import html
import requests

page = requests.get('https://www.germaneveryday.com/')
tree = html.fromstring(page.content)

Word = tree.xpath('//*[@id="main"]/div[1]/div[2]/div/h1/a')


print (Word)

I did inspect the daily word on the website, and using right click, copy xpath to extract the "tree.xpath" address for the specific html data I am willing to get out and print in my simple code using lxml + python.

Except that every time the output is either an empty parenthesis such as : [] or it is some html block that is meaningless As shown here : https://i.sstatic.net/dAjB6.png

My question is that, what is wrong here is it the xpath address or maybe the website has some kind of a layer over the html ?

(Excuse my ignorance using some descriptions such as : layer or address of xpath )

My System Info:

  • Windows 7 (x86)
  • Python Version is (v3.6.5)
  • Web Browser is Chrome 66.0.3359.181

3 Answers 3

3

Use list index to access the required element and .text to print its text.

Ex:

from lxml import html
import requests

page = requests.get('https://www.germaneveryday.com/')
tree = html.fromstring(page.content)
Word = tree.xpath('//*[@id="main"]/div[1]/div[2]/div/h1/a')[0].text
print (Word)

Output:

heimlich
Sign up to request clarification or add additional context in comments.

Comments

0

Try this code, It should work

from lxml import html
import requests

page = requests.get('http://www.germaneveryday.com/')
tree = html.fromstring(page.content)

word = tree.xpath('//*[@id="main"]/div[1]/div[2]/div/h1/a/text()')


print (word)

Comments

0

The problem was as said above by: Rakesh, Davide Fiocco and Devratna

"Use list index to access the required element and .text to print its text"

using the code

from lxml import html
import requests

page = requests.get('https://www.germaneveryday.com/')
tree = html.fromstring(page.content)
Word = tree.xpath('//*[@id="main"]/div[1]/div[2]/div/h1/a')[0].text
print (Word)

It is working now !

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.