Extracting a Link from a String using Python

Question

First off what I am trying to do is ask the user for a search term. The program then searches yahoo and prints out the link of the first result. Here's the code I have so far.

from urllib import urlopen

import re, time
from BeautifulSoup import BeautifulSoup


print "What Would You Like to Search For?"

user_input = raw_input('') #Gets Search Term from User



search = "http://search.yahoo.com/search;_ylt=A2KLtaJX_1BQfT4AwX2bvZx4?p=baker&toggle=1&cop=mss&ei=UTF-8&fr=yfp-t-701" 

new_search = search.replace('baker', user_input)           
content = urlopen( new_search ).read()                       

soupcontent = BeautifulSoup(content)                    


link1 = soupcontent.find(id="link-1")            
print link1

Everything works fine. It takes the user input and searches Yahoo. The problem I'm having is lets say I searched for 'dog'

the program would then print something like this: "a id="link-1" class="yschttl spt" href="http://www.dog.com/" data-bk="5101.1>b>Dog/b> Supplies | b>Dog/b> Food, b>Dog/b> Beds, b>Dog/b> wbr>/wbr>Flea Control & More .../a>"

Which Is indeed the first Link on the page. However I would only like it to print out "http://www.dog.com/" Can anyone help me with this?

Thanks.

I tried using that However i get this error

moretimetocry
– moretimetocry

2012-09-13 00:54:54 +00:00
Commented Sep 13, 2012 at 0:54 — moretimetocry
– moretimetocry, Commented Sep 13, 2012 at 0:54
did you try regular expressions?

transilvlad
– transilvlad

2012-10-06 13:06:21 +00:00
Commented Oct 6, 2012 at 13:06 — transilvlad
– transilvlad, Commented Oct 6, 2012 at 13:06

DSM · Accepted Answer · 2012-09-13 01:01:01Z

1

BeautifulSoup actually makes this very easy:

>>> from bs4 import BeautifulSoup
>>> from urllib2 import urlopen
>>> 
>>> url = 'http://search.yahoo.com/search?p=dog'
>>> content = urlopen(url).read()
>>> soup = BeautifulSoup(content)
>>> 
>>> soup.find(id="link-1")
<a class="yschttl spt" data-bk="5097.1" href="http://www.dog.com/" id="link-1"><b>Dog</b> Supplies | <b>Dog</b> Food, <b>Dog</b> Beds, <b>Dog</b> <wbr></wbr>Flea Control &amp; More ...</a>
>>> soup.find(id="link-1").get("href")
'http://www.dog.com/'

With your request for UTF-8 you'll probably see

 u'http://www.dog.com/'

instead, the Unicode version, which is fine too.

Standard warning: be sure to check that Yahoo!'s end-user license permits whatever you want to do, because many licenses rule out certain automated uses.

answered Sep 13, 2012 at 1:01

DSM

355k67 gold badges606 silver badges504 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

moretimetocry Over a year ago

Thank You DSM. I'd Being trying to do with Soup for hours. I tried many variations and none of them worked however .get("href") did. Thank you again

Community · Accepted Answer · 2012-09-13 01:05:22Z

1

Try using a regular expression. See: http://docs.python.org/library/re.html.

match = re.search(r'href="(http://.*?)"', str(link1))
print match.group(1)

edited Sep 13, 2012 at 1:05

CommunityBot

11 silver badge

answered Sep 13, 2012 at 0:49

Hans Then

11.4k3 gold badges36 silver badges52 bronze badges

4 Comments

Borgleader Over a year ago

He wants the http to be printed though so shouldn't it be r'href="(.*?)"' instead?

moretimetocry Over a year ago

No I dont have much experience with programming at all. I tried using that but i get this error Traceback (most recent call last): File "scraper.py", line 25, in <module> match = re.search(r'"http://(.*?)"', link1) File "/usr/lib/python2.6/re.py", line 142, in search return _compile(pattern, flags).search(string) TypeError: expected string or buffer

Hans Then Over a year ago

@moretimetocry Pna's answer will also work and is maybe simpler. Using regular expressions can be somewhat tricky.

Hans Then Over a year ago

I like DSM's solution better than mine, so please follow his suggestion.

pna · Accepted Answer · 2012-09-13 00:50:13Z

0

link = your_full_link_string.split('href="')[1].split('"')[0]

answered Sep 13, 2012 at 0:50

pna

5,7513 gold badges25 silver badges37 bronze badges

Collectives™ on Stack Overflow

Extracting a Link from a String using Python

3 Answers 3

1 Comment

4 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

1 Comment

4 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related