1

In order to learn scrapy I am crawling all the elements of this website:http://quotes.toscrape.com/random

However, I do not understand how to crawl the author url bio. I tried to use the css selector:

>>> response.css('a::attr(href)').extract()
['/', '/login', '/author/Ralph-Waldo-Emerson', '/tag/life/page/1/', '/tag/regrets/page/1/', 'https://www.goodreads.com/quotes', 'https://scrapinghub.com']

Then:

>>> response.css('small.quote>span>a::attr(href)').extract()

Nevertheless, I am not getting the author's bio url. Thus, how can I get the aforementioned url with the css selector?.

UPDATE

I already know that I can do:

response.css('a::attr(href)').extract()[2]

However, I guess this is not robust. Any idea of how to get the bio link?.

1 Answer 1

2

This might work:

>>> os.path.dirname(response.url)
'http://quotes.toscrape.com'

>> response.css('a::attr(href)').extract()[2]
u'/author/Bob-Marley'

>>> os.path.dirname(response.url) + response.css('a::attr(href)').extract()[2]
u'http://quotes.toscrape.com/author/Bob-Marley'
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.