Parsing building numbers from a string with regex

Question

I have been struggling with something that should be quite simple for hours now and I would appreciate any advice that could help. I have a Postgres database with addresses, I have a field, building_name which actually contains in many cases, building or apartment numbers. These numbers may or may not be suffixed with a letter e.g. 32A, 24b etc. These combinations could be anywhere in the string including the start or end. They may be followed by whitespace or some other non alphanumeric separator such as a slash or dash. Some examples below:

'11B' should return '11B'
'BURNFOOT COTTAGE' should return nothing as there are no numbers
'2/1' should return '2'
'15a' should return '15a'
'6 CAROLINA COURT' should return '6'
'PATRICK THOMAS COURT 83B' should return '83B'
'UNIT 51' should return '51'
'1/6 NEW ASSEMBLY CLOSE' should return '1'
'15E GREENVALE' should return '15E'

I am trying to achieve this using a regular expression. The closest I can get is '(\d+\w+)' which works for some of the above but does not work for:

'2/1' or '6 CAROLINA COURT' or '1/6 NEW ASSEMBLY CLOSE'

I have followed the advice here SQL split string at first occurance of a number but it does not work for my requirements.

Any advice would be hugely appreciated, I am completely stuck!

Many thanks in advance,

Mark

Is this raw PostGres SQL, or are you using a language with API, and if so, what flavour of RegEx? — declension
– declension, Commented Jan 29, 2015 at 16:21
Hi - I have been testing out my regex logic using regexr.com and will use the regex in my Postgres query e.g. select substring(building_name from '(\d+\w+)') AS building_num — Mark V
– Mark V, Commented Jan 29, 2015 at 16:38
You need to define in English the rules that you want to follow before you can implement them in a regex. Also, this may be illuminating: mjt.me.uk/posts/falsehoods-programmers-believe-about-addresses — Andy Lester
– Andy Lester, Commented Jan 29, 2015 at 18:19
Thanks for the advice Andy and that link is excellent - will be very useful indeed for my project — Mark V
– Mark V, Commented Jan 30, 2015 at 17:32

Ainar-G · Accepted Answer · 2015-01-29 16:54:08Z

1

Your regexp doesn't quite work because you use the + qualifier, which searches for one or more letter. If you want to look for one or none, use the ? qualifier: '\d+\w?'.

edited Jan 29, 2015 at 16:54

answered Jan 29, 2015 at 16:48

Ainar-G

36.5k14 gold badges103 silver badges125 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Mark V Over a year ago

Thank you all for your help, the suggestion Ainar-G worked perfectly for my particular problem. I appreciate it is quite tricky to test pg-flavoured regexes but very grateful for your time

Daniele · Accepted Answer · 2015-01-29 16:40:40Z

0

As mentioned by Nick B, it would be better to specify the RegEx implementation you are using. As a general answer though, you could try something like this:

(^|\s)(\d+[a-Z]?\b)

and take the second group from the result.

(^|\s) matches the line start or a whitespace. This allowes to exclude from the output the number 1 in the 2/1 testcase.

Then \d+[a-Z]? should match any sequence of at least one number followed by at most one letter.

Hope this helps!

answered Jan 29, 2015 at 16:40

Daniele

1476 bronze badges

1 Comment

Mark V Over a year ago

Thank you very much for your suggestion Daniele, did not quite work for me in pg but much appreciated all the same

declension · Accepted Answer · 2015-01-29 17:27:33Z

0

You're forcing a word character, when this is optional (and not catering for non alpha-numeric non-numerics).

So, assuming you're using POSIX regexes in PostGres, try something like this:

(\d+\w*)[ /\\\-]|$

making sure you capture group 1 as your output.

This involved a bit of guesswork, there aren't many PG-flavoured online testers.

Note it seems PostGres doesn't support Perl-flavoured regexes, so your \b won't ever work here, hence me avoiding it.

answered Jan 29, 2015 at 17:27

declension

4,20524 silver badges27 bronze badges

1 Comment

Mark V Over a year ago

Thank you very much for your suggestion Nick B, did not quite work for me in pg but much appreciated all the same

Collectives™ on Stack Overflow

Parsing building numbers from a string with regex

3 Answers 3

1 Comment

1 Comment

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

1 Comment

1 Comment

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related