Something wrong with Regular Expression in Python

Question

update:I've tested my Regular Expression by such code:

import re

pattern = r'^data-id="*/d"$'
html='data-id="89897907"'
m=re.search(pattern,html)
print m.group()

And I sitll got a m of none.

I'm writing a web-spider using python,but when I try to use Regular Expression to get all the strings like "data-id="798789"" I met a problem. My code is as below:

import sys
import urllib
import urllib2
import cookielib
import re
from urllib2 import Request, urlopen, URLError, HTTPError 

url="https://www.secure.pixiv.net/login.php"
#Process the cookie
cookie = cookielib.CookieJar()
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cookie))
#POST data to Pixiv
headers = {'User-Agent', 'Mozilla/5.0 (Windows NT 10.0; WOW64; rv:35.0) Gecko/20100101 Firefox/35.0'}  
values={'mode':'login','pixiv_id':'username','pass':'password','skip':'1'}
data=urllib.urlencode(values)
req=urllib2.Request(url,data)
#ERRORS
try:    

    response = opener.open(req,timeout=10)    

except URLError, e:    

    if hasattr(e, 'code'):    

        print 'The server couldn\'t fulfill the request.'    

        print 'Error code: ', e.code    

    elif hasattr(e, 'reason'):    

        print 'We failed to reach a server.'    

        print 'Reason: ', e.reason    

else:    
    print 'No exception was raised.' 

res=opener.open('http://www.pixiv.net/ranking.php?mode=daily')  
html = res.read()
pattern = r'^data-id="*/d"$'
m=re.search(pattern,html)
print m.group()

I run the code an got a m of none.Is there anything wrong?

Test your regular expressions against simple fixed strings, not web content. And post your minimal example here. We're not going to run your script or browse that website to determine what the input data is. — Jonathon Reinhart
– Jonathon Reinhart, Commented Feb 19, 2015 at 5:44

vks · Accepted Answer · 2015-02-19 06:00:12Z

2

I try to use Regular Expression to get all the strings like "data-id="798789""

pattern = r'^data-id="\d*"$'

Guess you need this.In fact if these are not the only contents in line use

r'\bdata-id="\d*"' or r'\bdata-id="\d+"'

See demo.

https://regex101.com/r/mS3tQ7/8

edited Feb 19, 2015 at 6:00

answered Feb 19, 2015 at 5:46

vks

68.1k11 gold badges96 silver badges132 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Vladimio Over a year ago

Thx.Maybe I should learn how to use Regular Expression again : )

Mark Ransom Over a year ago

The key is the *, although I think + would be more appropriate here. \d is only a single digit, not a whole number.

vks Over a year ago

@MarkRansom yeah right.+ is more suitable.Have added that too .Thanx :)

Collectives™ on Stack Overflow

Something wrong with Regular Expression in Python

1 Answer 1

3 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

3 Comments

Your Answer

Sign up or log in

Post as a guest

Related