1

I'm having difficulty with a Python regex. I want to fine any of N, S, E, W, NB, SB, EB, WB, including at the start or end of the string. My regex easily finds this in the middle, but fails on the start or end.

Can anyone advise what I'm doing wrong with dirPattern i below code sample?

Note: I realize I have some other problems to deal with (e.g. 'W of'), but think I know how to modify the regex for those.

Thanks in advance.

import re

nameList = ['Boulder Highway and US 95 NB',  'Boulder Hwy and US 95 SB', 
'Buffalo and Summerlin N', 'Charleston and I-215 W', 'Eastern and I-215 S', 'Flamingo and NB I-15',
'S Buffalo and Summerlin', 'Flamingo and SB I-15', 'Gibson and I-215 EB', 'I-15 at 3.5 miles N of Jean',
'I-15 NB S I-215 (dual)', 'I-15 SB 4.3 mile N of Primm', 'I-15 SB S of Russell', 'I-515 SB at Eastern W',
'I-580 at I-80 N E', 'I-580 at I-80 S W', 'I-80 at E 4TH St Kietzke Ln', 'I-80 East of W McCarran',
'LV Blvd at I-215 S', 'S Buffalo and I-215 W', 'S Decatur and I-215 WB', 'Sahara and I-15 East',
'Sands and Wynn South Gate', 'Silverado Ranch and I-15 (west side)']

dirMap = {'N': 'North', 'S': 'South', 'E': 'East', 'W': 'West'}

dirPattern = re.compile(r'[ ^]([NSEW])B?[ $]')

print('name\tmatch\tdirSting\tdirection')
for name in nameList:
    match = dirPattern.search(name)
    direction = None
    dirString = None
    if match:
        dirString = match.group(1)
        if dirString in dirMap:
            direction = dirMap[dirString]
    print('%s\t%s\t%s\t%s'%(name, match, dirString, direction))

Some sample expected output:

name match dirSting direction

Boulder Highway and US 95 NB <_sre.SRE_Match object at 0x7f68af836648> N North

Boulder Hwy and US 95 SB <_sre.SRE_Match object at 0x7f68ae836648> S South

Buffalo and Summerlin N <_sre.SRE_Match object at 0x7f68af826648> N North

Charleston and I-215 W <_sre.SRE_Match object at 0x7f68cf836648> W West

Flamingo and NB I-15 <_sre.SRE_Match object at 0x7f68af8365d0> N North

S Buffalo and Summerlin <_sre.SRE_Match object at 0x7f68aff36648> S South

Gibson and I-215 EB <_sre.SRE_Match object at 0x7f68afa36648> E East

However, start or end examples give:

Boulder Highway and US 95 NB None None None

15
  • 2
    ^ and $ inside square brackets doesn't still mean the start/end of the string, you know? Commented Jul 19, 2015 at 16:12
  • Jon,Thanks, I did not know, although I was beginning to suspect this. Commented Jul 19, 2015 at 16:18
  • 1
    What are you trying to do exactly? You can also use direction = dirMap.get(dirString), that will return None if there is no matching key in the dict Commented Jul 19, 2015 at 16:19
  • Padraic, that's a good tip I was unaware of. Could save a bit of coding. Fundamentally, m question is about extracting the keys N, S, E, W from the sample strings. What I want is N, NB, etc., but only by itself, i.e. either at start followed by space, at end preceded by space, or in middle with space before and after. Commented Jul 19, 2015 at 16:25
  • Any reason my question was voted down? I completely realize this is a fairly rookie issue However, I did look around the prior questions, and didn't find anything that helped. Also, I provided working code sample. Seems pretty arbitrary to me. Commented Jul 19, 2015 at 16:30

2 Answers 2

1

You need to use lookarounds.

dirPattern = re.compile(r'(?<!\S)([NSEW])B?(?!\S)')

[ ^] would match a space or caret symbol. (?<!\S) negative lookbehind asserts that the match would be preceded by any bot not a non-space character. (?!\S) asserts that he match must not be followed by a non-space character.

Why I used negative lookahead instead of positive means, python's default re module won't support (?<=^| ) .

Sign up to request clarification or add additional context in comments.

1 Comment

Avinash, Thanks for the tip. While I was waiting for answers, I began using lookarounds to handle cases like 'at E 2nd St' or 'W of I-15' (both to be excluded. What I want is N, NB, etc., but only by itself, i.e. either at start followed by space, at end preceded by space, or in middle with space before and after. Your answer may get me there, but right now I'm not sure how.
0

The modified regex in this code does the trick. This includes handling things like 'W of', 'at E', and similar:

import re

nameList = ['Boulder Highway and US 95 NB',  'Boulder Hwy and US 95 SB', 
'Buffalo and Summerlin N', 'Charleston and I-215 W', 'Eastern and I-215 S', 'Flamingo and NB I-15',
'S Buffalo and Summerlin', 'Flamingo and SB I-15', 'Gibson and I-215 EB', 'I-15 at 3.5 miles N of Jean',
'I-15 NB S I-215 (dual)', 'I-15 SB 4.3 mile N of Primm', 'I-15 SB S of Russell', 'I-515 SB at Eastern W',
'I-580 at I-80 N E', 'I-580 at I-80 S W', 'I-80 at E 4TH St Kietzke Ln', 'I-80 East of W McCarran',
'LV Blvd at I-215 S', 'S Buffalo and I-215 W', 'S Decatur and I-215 WB', 'Sahara and I-15 East',
'Sands and Wynn South Gate', 'Silverado Ranch and I-15 (west side)']

dirMap = {'N': 'North', 'S': 'South', 'E': 'East', 'W': 'West'}

dirPattern = re.compile(r'(?:^| )(?<! at )(?<! of )([NSEW])B?(?! of )(?: |$)')

print('name\tdirSting\tdirection')
for name in nameList:
    match = dirPattern.search(name)
    direction = None
    dirString = None
    if match:
        dirString = match.group(1)
        direction = dirMap.get(dirString)
    print('> %s\t\t%s\t%s'%(name, dirString, direction))

The regex can be understood as follows:

(?:^| ) start with either beginning of string or a space

(?<! at ) not preceded by ' at '

(?<! of ) not preceded by ' of '

([NSEW]) Any one of 'N', 'S', 'E', 'W' (this will be in match.group(1))

B? Optionally followed by 'B' (as in bound)

(?! of ) not followed by ' at '

(?: |$) end with either end of string or a space

Final output is:

Boulder Highway and US 95 NB N North

Boulder Hwy and US 95 SB S South

Buffalo and Summerlin N N North

Charleston and I-215 W W West

Eastern and I-215 S S South

Flamingo and NB I-15 N North

S Buffalo and Summerlin S South

Flamingo and SB I-15 S South

Gibson and I-215 EB E East

I-15 at 3.5 miles N of Jean None None

I-15 NB S I-215 (dual) N North

I-15 SB 4.3 mile N of Primm S South

I-15 SB S of Russell S South

I-515 SB at Eastern W S South

I-580 at I-80 N E N North

I-580 at I-80 S W S South

I-80 at E 4TH St Kietzke Ln None None

I-80 East of W McCarran None None

LV Blvd at I-215 S S South

S Buffalo and I-215 W S South

S Decatur and I-215 WB S South

Sahara and I-15 East None None

Sands and Wynn South Gate None None

Silverado Ranch and I-15 (west side) None None

Side note: I decided I don't want the end string case. For this, the regex would be:

dirPattern = re.compile(r'(?:^| )(?<! at )(?<! of )([NSEW])B? (?!of )')

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.