1

I am trying to search a string in python using regex for a particular word that begins with a space and ends with a space after it. The string in question that I want to search is;

JAKARTA, INDONESIA (1 February 2017)

and I want to get back the ", INDONESIA (" part so I can apply rtrim and ltrim to it. As I could also be returning United Kingdom.

I have attempted to write this code within my python code;

import re
text = "JAKARTA, INDONESIA (1 February 2017)"
countryRegex = re.compile(r'^(,)(\s)([a-zA-Z]+)(\s)(\()$')
mo = countryRegex.search(text)
print(mo.group())

However this prints out the result

AttributeError: 'NoneType' object has no attribute 'group'

Indicated to me that I am not returning any matched objects.

I then attempted to use my regex in regex 101 however it still returns an error here saying "Your regular expression does not match the subject string."

I assumed this would work as I test for literal comma (,) then a space (\s), then one or more letters ([a-zA-Z]+), then another space (\s) and then finally an opening bracket making sure I have escaped it (\(). Is there something wrong with my regex?

4
  • 1
    The ^ and $ anchors must be removed. Commented Feb 7, 2017 at 13:12
  • And the ^ anchor too. Commented Feb 7, 2017 at 13:14
  • @WiktorStribiżew This worked. Would it be possible to explain why please? Commented Feb 7, 2017 at 13:15
  • @mp252 ^ and $ are used to represent the beginning and end of the string you're searching. They would only match your regex if the comma was the first character in the string, and the open parenthesis was the last. Commented Feb 7, 2017 at 13:20

2 Answers 2

2

You can try use this regex instead, with a Lookbehind and a lookahead so it only matches the State part.
Adding a space in the list can help you match states like United Kingdom.

(?<=, )([a-zA-Z ]+)(?= \()

Test on Regex101

Sign up to request clarification or add additional context in comments.

2 Comments

You mix the lookarounds with capturing, so why bother using lookarounds at all?
I use Lookaround because i think it's better than capture more text than needed. The capturing Group is added in case you prefer use the Groups instead of the match.
1

Once you remove the anchors (^ matches the start of string position and $ matches the end of string position), the regex will match the string. However, you may get INDONESIA with a capturing group using:

,\s*([a-zA-Z]+)\s*\(

See the regex demo. match.group(1) will contain the value.

Details:

  • ,\s* - a comma and zero or more whitespaces (replace * with + if you want at least 1 whitespace to be present)
  • ([a-zA-Z]+) - capturing group 1 matching one or more ASCII letters
  • \s* - zero or more whitespaces
  • \( - a ( literal symbol.

Sample Python code:

import re 
text = "JAKARTA, INDONESIA (1 February 2017)"
countryRegex = re.compile(r',\s*([a-zA-Z]+)\s*\(') 
mo = countryRegex.search(text)
if mo:
    print(mo.group(1))

An alternative regex that would capture anything between ,+whitespace and whitespace+( is

,\s*([^)]+?)\s*\(

See this regex demo. Here, [^)]+? matches 1+ chars other than ) as few as possible.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.