2

I'm trying to build a scraper to retrieve translations from wiktionary. I'm calling this function that should return a list with all the translations of the argument word, but it returns an empty list. The command response.css('ol').re(r'(?<=>)\w+(?=<)') is working on scrappy shell, though. The word I'm using as a test is "Hallo"

 def scrape_translation(word):
        url = "https://en.wiktionary.org/wiki/" + word
        response = HtmlResponse(url=url)
        translation_list = response.css('ol').re(r'(?<=>)\w+(?=<)')
        print(translation_list)

I'm using Python 3.6.4

2
  • Which python version do you use (python 2 or 3)? Commented Mar 23, 2018 at 21:52
  • I'm using Python 3.6.4 Commented Mar 23, 2018 at 22:16

1 Answer 1

1

HtmlResponse is used to convert HTML string to HtmlResponse object. So you need to add HTML string as argument body:

import requests

def scrape_translation(word):
    url = "https://en.wiktionary.org/wiki/" + word
    r = requests.get(url)
    response = HtmlResponse(url=url, body = r.content)
    translation_list = response.css('ol').re(r'(?<=>)\w+(?=<)')
    print(translation_list)

scrape_translation('Hallo')

I used requests library, but there are other python modules which can extract HTML from URL.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.