2

I have HTML where I need to collect all the content that has a particular format, e.g. get everything that is in the 00.000.000/0000-00 or XX.YYY.IIO/KKKK-LL formats.

Would use of regular expressions be the best way to accomplish this, or how else can I accomplish this?

14
  • I could do what I wanted with this: [0-9]{2}\.?[0-9]{3}\.?[0-9]{3}\/?[0-9]{4}\-?[0-9]{2} Commented Jul 9, 2015 at 23:25
  • That won't match XX.YYY.IIO/KKKK-LL. .{2}\..{3}\..{3}\/.{4}-.{2} ? Commented Jul 9, 2015 at 23:36
  • And your use of optional for those dot, slash and hyphen separators is wrong. You would be matching also 00000000000000 with such pattern. Commented Jul 9, 2015 at 23:52
  • 1
    Ugh! Yet another "how do I parse HTML/XML with regular expressions" question. Read stackoverflow.com/q/701166/62576 for one of many posts about why you've chosen the wrong tool for the job, and then use a DOM parser to make your life (and the lives of others who may have to maintain your code) much easier. Commented Jul 10, 2015 at 2:38
  • stackoverflow.com/questions/1732348/… he comes Commented Jul 10, 2015 at 3:08

2 Answers 2

3

If you're looking for a pattern that will match:

xx.xxx.xxx/xxxx-xx

where x is only an alphanumeric char (that is a-z, A-Z and 0-9), then you can use this pattern:

[a-zA-Z0-9]{2}\.[a-zA-Z0-9]{3}\.[a-zA-Z0-9]{3}\/[a-zA-Z0-9]{4}-[a-zA-Z0-9]{2}

You can try it in this example.

Sign up to request clarification or add additional context in comments.

Comments

1

Try with:

\w{2}\.\w{3}\.\w{3}\/\w{4}-\w{2}

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.