0

update:I've tested my Regular Expression by such code:

import re

pattern = r'^data-id="*/d"$'
html='data-id="89897907"'
m=re.search(pattern,html)
print m.group()

And I sitll got a m of none.

I'm writing a web-spider using python,but when I try to use Regular Expression to get all the strings like "data-id="798789"" I met a problem. My code is as below:

import sys
import urllib
import urllib2
import cookielib
import re
from urllib2 import Request, urlopen, URLError, HTTPError 

url="https://www.secure.pixiv.net/login.php"
#Process the cookie
cookie = cookielib.CookieJar()
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cookie))
#POST data to Pixiv
headers = {'User-Agent', 'Mozilla/5.0 (Windows NT 10.0; WOW64; rv:35.0) Gecko/20100101 Firefox/35.0'}  
values={'mode':'login','pixiv_id':'username','pass':'password','skip':'1'}
data=urllib.urlencode(values)
req=urllib2.Request(url,data)
#ERRORS
try:    

    response = opener.open(req,timeout=10)    

except URLError, e:    

    if hasattr(e, 'code'):    

        print 'The server couldn\'t fulfill the request.'    

        print 'Error code: ', e.code    

    elif hasattr(e, 'reason'):    

        print 'We failed to reach a server.'    

        print 'Reason: ', e.reason    

else:    
    print 'No exception was raised.' 

res=opener.open('http://www.pixiv.net/ranking.php?mode=daily')  
html = res.read()
pattern = r'^data-id="*/d"$'
m=re.search(pattern,html)
print m.group()

I run the code an got a m of none.Is there anything wrong?

1
  • 1
    Test your regular expressions against simple fixed strings, not web content. And post your minimal example here. We're not going to run your script or browse that website to determine what the input data is. Commented Feb 19, 2015 at 5:44

1 Answer 1

2

I try to use Regular Expression to get all the strings like "data-id="798789""

pattern = r'^data-id="\d*"$'

Guess you need this.In fact if these are not the only contents in line use

r'\bdata-id="\d*"' or r'\bdata-id="\d+"'

See demo.

https://regex101.com/r/mS3tQ7/8

Sign up to request clarification or add additional context in comments.

3 Comments

Thx.Maybe I should learn how to use Regular Expression again : )
The key is the *, although I think + would be more appropriate here. \d is only a single digit, not a whole number.
@MarkRansom yeah right.+ is more suitable.Have added that too .Thanx :)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.