Searching regex expression, to return string with spaces

Question

I am trying to search a string in python using regex for a particular word that begins with a space and ends with a space after it. The string in question that I want to search is;

JAKARTA, INDONESIA (1 February 2017)

and I want to get back the ", INDONESIA (" part so I can apply rtrim and ltrim to it. As I could also be returning United Kingdom.

I have attempted to write this code within my python code;

import re
text = "JAKARTA, INDONESIA (1 February 2017)"
countryRegex = re.compile(r'^(,)(\s)([a-zA-Z]+)(\s)(\()$')
mo = countryRegex.search(text)
print(mo.group())

However this prints out the result

AttributeError: 'NoneType' object has no attribute 'group'

Indicated to me that I am not returning any matched objects.

I then attempted to use my regex in regex 101 however it still returns an error here saying "Your regular expression does not match the subject string."

I assumed this would work as I test for literal comma (,) then a space (\s), then one or more letters ([a-zA-Z]+), then another space (\s) and then finally an opening bracket making sure I have escaped it (\(). Is there something wrong with my regex?

@WiktorStribiżew This worked. Would it be possible to explain why please? — mp252
– mp252, Commented Feb 7, 2017 at 13:15
@mp252 ^ and $ are used to represent the beginning and end of the string you're searching. They would only match your regex if the comma was the first character in the string, and the open parenthesis was the last. — glibdud
– glibdud, Commented Feb 7, 2017 at 13:20

Giacomo Garabello · Accepted Answer · 2017-02-07 13:20:46Z

2

You can try use this regex instead, with a Lookbehind and a lookahead so it only matches the State part.
Adding a space in the list can help you match states like United Kingdom.

(?<=, )([a-zA-Z ]+)(?= \()

Test on Regex101

answered Feb 7, 2017 at 13:20

Giacomo Garabello

3071 gold badge6 silver badges16 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Wiktor Stribiżew Over a year ago

You mix the lookarounds with capturing, so why bother using lookarounds at all?

Giacomo Garabello Over a year ago

I use Lookaround because i think it's better than capture more text than needed. The capturing Group is added in case you prefer use the Groups instead of the match.

Wiktor Stribiżew · Accepted Answer · 2017-02-07 13:32:28Z

Once you remove the anchors (^ matches the start of string position and $ matches the end of string position), the regex will match the string. However, you may get INDONESIA with a capturing group using:

,\s*([a-zA-Z]+)\s*\(

See the regex demo. match.group(1) will contain the value.

Details:

,\s* - a comma and zero or more whitespaces (replace * with + if you want at least 1 whitespace to be present)
([a-zA-Z]+) - capturing group 1 matching one or more ASCII letters
\s* - zero or more whitespaces
\( - a ( literal symbol.

Sample Python code:

import re 
text = "JAKARTA, INDONESIA (1 February 2017)"
countryRegex = re.compile(r',\s*([a-zA-Z]+)\s*\(') 
mo = countryRegex.search(text)
if mo:
    print(mo.group(1))

An alternative regex that would capture anything between ,+whitespace and whitespace+( is

,\s*([^)]+?)\s*\(

See this regex demo. Here, [^)]+? matches 1+ chars other than ) as few as possible.

Collectives™ on Stack Overflow

Searching regex expression, to return string with spaces

2 Answers 2

2 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

2 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related