0

I'm using BeautifulSoup to scrap a webpage. I want to save each url in a list. However the operator + is not working properly. This is the code:

for a in soup.find_all('a', class_="hotel_name_link url"):
    hotel_url = "https://www.booking.com" + a['href']
    hotels_url_list.append(hotel_url)

I have to do it this way because the a['href'] attribute only gets the file location in the server but not the whole url (for example:

/hotel/es/aqua-aquamarina.es.html?label=gen173nr-1BCAEoggJCAlhYSDNYBGigAYgBAZgBCrgBB8gBDNgBAegBAZICAXmoAgM;sid=aa0d6c563b3d74f5432fb5d5b250eee4;ucfs=1;srpvid=2d5d1564170400e8;srepoch=1514343753;room1=A%2CA;hpos=15;hapos=15;dest_type=country;dest_id=197;srfid=198499756e07f93263596e1640823813c2ee4fe1X15;from=searchresults
;highlight_room=#hotelTmpl)

But when I print the results it displays the following:

enter image description here

What can I do to concat the urls in a way that BeautifulSoup can handle?

3
  • what is wrong with those urls ? They look OK. Did you try them to load pages ? Maybe only your IDE display it in different way. Commented Dec 27, 2017 at 3:21
  • if I use strip() - "https://www.booking.com" + a['href'].strip() - then I get url which requests can read with status 200. Commented Dec 27, 2017 at 3:44
  • almost worked but the last page of the url is still out of place @furas Commented Dec 27, 2017 at 4:15

1 Answer 1

1

you can use urljoin:

from urlparse import urljoin


hotel_url = urljoin("https://www.booking.com", a['href'])
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.