1

I've written a script in python using regular expression to find phone numbers from two different sites. when I tried with below pattern to scrape the two phone numbers locally then it works flawlessly. However, when i try the same in the websites, It no longer works. It only fetches two unidentified numbers 1999 and 8211.

This is what I've tried so far:

import requests, re

links=[
    'http://www.latamcham.org/contact-us/',
    'http://www.cityscape.com.sg/?page_id=37'
    ]

def FetchPhone(site):
    res = requests.get(site).text
    phone = re.findall(r"\+?[\d]+\s?[\d]+\s?[\d]+",res)[0]  #I'm not sure if it is an ideal pattern. Works locally though
    print(phone)

if __name__ == '__main__':
    for link in links:
        FetchPhone(link)

The output I wish to have:

+65 6881 9083
+65 93895060

This is what I meant by locally:

import re

phonelist = "+65 6881 9083,+65 93895060"

phone = [item for item in re.findall(r"\+?[\d]+\s?[\d]+\s?[\d]+",phonelist)]
print(phone)  #it can print them

Post script: the phone numbers are not generated dynamically. When I print text then I can see the numbers in the console.

3
  • 1
    What do you mean by scraping them locally? Have you tried printing res and seeing if it contains the phone numbers? Commented Apr 18, 2018 at 20:18
  • Try r"\+?\d{1,3}\s?\d{4}\s?\d{4}" Commented Apr 18, 2018 at 20:29
  • I tried with your suggested expression @Wiktor Stribiżew and got this 3333333333,14465014376. Thanks. Commented Apr 18, 2018 at 20:50

2 Answers 2

1

In your case below regex should return required output

r"\+\d{2}\s\d{4}\s?\d{4}"

Note that it can be applied to mentioned schemas:

  • +65 6881 9083
  • +65 93895060

and might not work in other cases

Sign up to request clarification or add additional context in comments.

1 Comment

Thanks for the solution sir. I know it won't work in other cases. However, I intended to see how regex can be applied in parsing phone numbers from the web.
0

You are using \d+\s?\d+ which will match 9 9, 99 and 1999 because the + quantifier allows the first \d+ to grab as many digits as it can while leaving at least one digit to the others. One solution is to state a specific number of repetitions you want (like in Andersson's answer).

I suggest you try regex101.com, it will highlight to help you visualize what the regex is matching and capturing. There you can paste an example of the text you want to search and tweak your regex.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.