0

I want to search if there is /[some names]/unfollow in a webpage. And I have very little experience on regular expressions. This is what I worked for now.

import urllib
import re

page = urllib.urlopen('http://www.domain.com').read()
results = re.findall('/[\w]*/unfollow', page)
for i in results:
    print i

But the code above not printing anything. Am I doing it wrong? If so, I really need help from you guys

Thanks

3
  • do you just want the words before /unfollow? Commented Jul 6, 2014 at 10:00
  • that regex finds nothing on your page, anyway I suggest you use beautifulsoup to parse a web page, using re is not a great idea. Commented Jul 6, 2014 at 10:21
  • @PadraicCunningham yes I just want the words before /unfollow. In some cases, I have to go through 480 webpages using while loop. I think that's time consuming. Is using beautifulsoup making it more time efficient? Commented Jul 6, 2014 at 13:40

1 Answer 1

1

Your findall function should be,

results = re.findall(r'\/[^\/]*\/unfollow', page)

It will findall all the strings which are in /some names/unfollow format.

Explanation:

  • \/ Matches a literal / symbol.
  • [^\/]* Matches any character not of / zero or more times.
  • \/unfollow Matches the string /unfollow
Sign up to request clarification or add additional context in comments.

1 Comment

If you want only the words before /unfollow then try m = re.findall(r'\/([^\/]*)\/unfollow', str) code.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.