Regex used within python giving unknown results

Question

I've written a script in python using regular expression to find phone numbers from two different sites. when I tried with below pattern to scrape the two phone numbers locally then it works flawlessly. However, when i try the same in the websites, It no longer works. It only fetches two unidentified numbers 1999 and 8211.

This is what I've tried so far:

import requests, re

links=[
    'http://www.latamcham.org/contact-us/',
    'http://www.cityscape.com.sg/?page_id=37'
    ]

def FetchPhone(site):
    res = requests.get(site).text
    phone = re.findall(r"\+?[\d]+\s?[\d]+\s?[\d]+",res)[0]  #I'm not sure if it is an ideal pattern. Works locally though
    print(phone)

if __name__ == '__main__':
    for link in links:
        FetchPhone(link)

The output I wish to have:

+65 6881 9083
+65 93895060

This is what I meant by locally:

import re

phonelist = "+65 6881 9083,+65 93895060"

phone = [item for item in re.findall(r"\+?[\d]+\s?[\d]+\s?[\d]+",phonelist)]
print(phone)  #it can print them

Post script: the phone numbers are not generated dynamically. When I print text then I can see the numbers in the console.

What do you mean by scraping them locally? Have you tried printing res and seeing if it contains the phone numbers? — Alex Hall
– Alex Hall, Commented Apr 18, 2018 at 20:18
I tried with your suggested expression @Wiktor Stribiżew and got this 3333333333,14465014376. Thanks. — SIM
– SIM, Commented Apr 18, 2018 at 20:50

Andersson · Accepted Answer · 2018-04-18 20:53:30Z

1

In your case below regex should return required output

r"\+\d{2}\s\d{4}\s?\d{4}"

Note that it can be applied to mentioned schemas:

+65 6881 9083
+65 93895060

and might not work in other cases

answered Apr 18, 2018 at 20:53

Andersson

52.8k18 gold badges83 silver badges132 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

SIM Over a year ago

Thanks for the solution sir. I know it won't work in other cases. However, I intended to see how regex can be applied in parsing phone numbers from the web.

izxle · Accepted Answer · 2018-04-18 21:51:04Z

0

You are using \d+\s?\d+ which will match 9 9, 99 and 1999 because the + quantifier allows the first \d+ to grab as many digits as it can while leaving at least one digit to the others. One solution is to state a specific number of repetitions you want (like in Andersson's answer).

I suggest you try regex101.com, it will highlight to help you visualize what the regex is matching and capturing. There you can paste an example of the text you want to search and tweak your regex.

answered Apr 18, 2018 at 21:51

izxle

4151 gold badge5 silver badges19 bronze badges

Collectives™ on Stack Overflow

Regex used within python giving unknown results

2 Answers 2

1 Comment

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Related