Searching string in a webpage using regular expression on Python?

Question

I want to search if there is /[some names]/unfollow in a webpage. And I have very little experience on regular expressions. This is what I worked for now.

import urllib
import re

page = urllib.urlopen('http://www.domain.com').read()
results = re.findall('/[\w]*/unfollow', page)
for i in results:
    print i

But the code above not printing anything. Am I doing it wrong? If so, I really need help from you guys

Thanks

that regex finds nothing on your page, anyway I suggest you use beautifulsoup to parse a web page, using re is not a great idea. — Padraic Cunningham
– Padraic Cunningham, Commented Jul 6, 2014 at 10:21
@PadraicCunningham yes I just want the words before /unfollow. In some cases, I have to go through 480 webpages using while loop. I think that's time consuming. Is using beautifulsoup making it more time efficient? — possibility0
– possibility0, Commented Jul 6, 2014 at 13:40

Avinash Raj · Accepted Answer · 2014-07-06 10:40:45Z

1

Your findall function should be,

results = re.findall(r'\/[^\/]*\/unfollow', page)

It will findall all the strings which are in /some names/unfollow format.

Explanation:

\/ Matches a literal / symbol.
[^\/]* Matches any character not of / zero or more times.
\/unfollow Matches the string /unfollow

edited Jul 6, 2014 at 10:40

answered Jul 6, 2014 at 9:34

Avinash Raj

175k32 gold badges247 silver badges289 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Avinash Raj Over a year ago

If you want only the words before /unfollow then try m = re.findall(r'\/([^\/]*)\/unfollow', str) code.

Collectives™ on Stack Overflow

Searching string in a webpage using regular expression on Python?

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related