Scrappy CSS selector returning empty list

Question

I'm trying to build a scraper to retrieve translations from wiktionary. I'm calling this function that should return a list with all the translations of the argument word, but it returns an empty list. The command response.css('ol').re(r'(?<=>)\w+(?=<)') is working on scrappy shell, though. The word I'm using as a test is "Hallo"

 def scrape_translation(word):
        url = "https://en.wiktionary.org/wiki/" + word
        response = HtmlResponse(url=url)
        translation_list = response.css('ol').re(r'(?<=>)\w+(?=<)')
        print(translation_list)

I'm using Python 3.6.4

Which python version do you use (python 2 or 3)?

ands
– ands

2018-03-23 21:52:30 +00:00
Commented Mar 23, 2018 at 21:52 — ands
– ands, Commented Mar 23, 2018 at 21:52
I'm using Python 3.6.4

user5395461
– user5395461

2018-03-23 22:16:30 +00:00
Commented Mar 23, 2018 at 22:16 — user5395461
– user5395461, Commented Mar 23, 2018 at 22:16

ands · Accepted Answer · 2018-03-23 22:18:22Z

1

HtmlResponse is used to convert HTML string to HtmlResponse object. So you need to add HTML string as argument body:

import requests

def scrape_translation(word):
    url = "https://en.wiktionary.org/wiki/" + word
    r = requests.get(url)
    response = HtmlResponse(url=url, body = r.content)
    translation_list = response.css('ol').re(r'(?<=>)\w+(?=<)')
    print(translation_list)

scrape_translation('Hallo')

I used requests library, but there are other python modules which can extract HTML from URL.

answered Mar 23, 2018 at 22:18

ands

2,06619 silver badges30 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Scrappy CSS selector returning empty list

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related