2

I am looking into the Regex function in Python. As part of this, I am trying to extract a substring from a string.

For instance, assume I have the string:

<place of birth="Stockholm">

Is there a way to extract Stockholm with a single regex call?

So far, I have:

location_info = "<place of birth="Stockholm">"

#Remove before
location_name1 = re.sub(r"<place of birth=\"", r"", location_info)
#location_name1 --> Stockholm">

#Remove after
location_name2 = re.sub(r"\">", r"", location_name1)
#location_name2 --> Stockholm

Any advice on how to extract the string Stockholm, without using two "re.sub" calls is highly appreciated.

1
  • Is there a specific reason why you are removing the rest of the string, instead of selecting the part you want with something like <.*="(.*)".*> Commented Sep 28, 2015 at 8:31

4 Answers 4

3

Sure, you can match the beginning up to the double quotes, and match and capture all the characters other than double quotes after that:

import re
p = re.compile(r'<place of birth="([^"]*)')
location_info = "<place of birth=\"Stockholm\">"
match = p.search(location_info)
if match:
    print(match.group(1))

See IDEONE demo

The <place of birth=" is matched as a literal, and ([^"]*) is a capture group 1 matching 0 or more characters other than ". The value is accessed with .group(1).

Here is a REGEX demo.

Sign up to request clarification or add additional context in comments.

1 Comment

I just compared my regex and vks' and found out that mine is a bit quicker :) Also, of your string may contain more than just "place of birth="..." my regex will deal with that task better.
1
print re.sub(r'^[^"]*"|"[^"]*$',"",location_info)

This should do it for you.See demo.

https://regex101.com/r/vV1wW6/30#python

Comments

0

Is there a specific reason why you are removing the rest of the string, instead of selecting the part you want with something like

location_info = "<place of birth="Stockholm">"
location_info = re.search('<.*="(.*)".*>', location_info, re.IGNORECASE).group(1)

Comments

0

this code tested under python 3.6

 test =  '<place of birth="Stockholm">'
 resp = re.sub(r'.*="(\w+)">',r'\1',test)
 print (resp)


 Stockholm

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.