Python look for pattern in string

Question

I am having trouble understanding the regular expressions module in Python. I think what I am trying to do is fairly simple, but I cannot figure it out.

I need to search through some xml files and find this pattern:

'DisplayName="Parcels (10-1-2012)"'

I can parse through the xml and make replacements no problem, the part I cannot figure out is how to do a wild card search to find any instance of "Parcels (some-date-year)". Since the date will vary, I need to find this pattern:

pat = '"Parcels (*-*-*)"'

and I want to replace it with today's date which I can do with the time module. I copied out a line of one of the 80 or so xml docs where I would need to find the pattern.

According to the help for the re.search() function, it seems I can just put in a pattern, then the string I wish to search through. However, I am getting errors.

Help on function search in module re:

search(pattern, string, flags=0) Scan through string looking for a match to the pattern, returning a match object, or None if no match was found.

Here is my little test snippet:

import re
pat = '"Parcels (*-*-*)"'
t= '         <Layer DisplayName="Parcels (7-1-2010)" FeatureDescription="Owner Name: {OWNER_NAME}&lt;br/&gt;Property Address: {PROP_ADDR}&lt;br/&gt;Tax Name: {TAX_NAME}&lt;br/&gt;Tax Address 1: {TAX_ADD_L1}&lt;br/&gt;Tax Address 2: {TAX_ADD_L2}&lt;br/&gt;Land Use: {USE1_DESC}&lt;br/&gt;&lt;a href=&quot;http://www16.co.hennepin.mn.us/pins/pidresult.jsp?pid={PID_NO}&quot;&gt;View Property Information&lt;/a&gt;&lt;br/&gt;&lt;br/&gt;&lt;br/&gt;" FeatureLabel="Parcel ID: {PID_NO}" IconUri="{RestVirtualDirectoryUrl}/Images/Parcel.png" Identifiable="true" IncludeInLayerList="true" IncludeInLegend="true" Name="Parcels" Searchable="true" ShowMapTips="true" UnconfiguredFieldsSearchable="true" UnconfiguredFieldsVisible="true" Visible="true">'
match = re.search(pat, t)
print match

Most of the line is junk I don't need to worry about. I just need to see how I can find that date in the line so I can use just that piece in the replace() function. Does anyone know how I could find these dates? There may be other dates in the xml somewhere, but I don't need to replace these; just where it says "Parcels (some-date-year)". I appreciate any help! Thanks!

The pattern needs to be a regular expression. docs.python.org/2/howto/regex.html may help you? — Wooble
– Wooble, Commented Jan 30, 2014 at 16:11
Did you get an error from this? Is this relevant? stackoverflow.com/questions/3675144/… — doctorlove
– doctorlove, Commented Jan 30, 2014 at 16:15

Aaron Hall · Accepted Answer · 2014-01-30 16:34:51Z

1

import re

t= '         <Layer DisplayName="Parcels (7-1-2010)" FeatureDescription="Owner Name: {OWNER_NAME}&lt;br/&gt;Property Address: {PROP_ADDR}&lt;br/&gt;Tax Name: {TAX_NAME}&lt;br/&gt;Tax Address 1: {TAX_ADD_L1}&lt;br/&gt;Tax Address 2: {TAX_ADD_L2}&lt;br/&gt;Land Use: {USE1_DESC}&lt;br/&gt;&lt;a href=&quot;http://www16.co.hennepin.mn.us/pins/pidresult.jsp?pid={PID_NO}&quot;&gt;View Property Information&lt;/a&gt;&lt;br/&gt;&lt;br/&gt;&lt;br/&gt;" FeatureLabel="Parcel ID: {PID_NO}" IconUri="{RestVirtualDirectoryUrl}/Images/Parcel.png" Identifiable="true" IncludeInLayerList="true" IncludeInLegend="true" Name="Parcels" Searchable="true" ShowMapTips="true" UnconfiguredFieldsSearchable="true" UnconfiguredFieldsVisible="true" Visible="true">'

You need to escape the parens and then you can be more specific as to the contents, the generic character is ., and the * means 0 or more:

pat = '"Parcels \(.*\)"'
match = re.search(pat, t)
print(match.group())

Which prints:

"Parcels (7-1-2010)"

a more specific pattern would be:

pat = '"Parcels \([0-9]+-[0-9]+-[0-9]+\)"'
match = re.search(pat, t)
print(match.group())

Which prints:

"Parcels (7-1-2010)"

Here, the bracket contents ([0-9]) unitarily describe all the numbers from 0 to 9 (\d would be equivalent), the plus, +, following them means more than 0, and the dash means itself.

edited Jan 30, 2014 at 16:34

answered Jan 30, 2014 at 16:17

Aaron Hall♦

400k93 gold badges416 silver badges342 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

crmackey Over a year ago

Thanks Aaron! The re module's help is a little confusing. I need to do some more reading there. That did the trick though.

crmackey Over a year ago

Thanks for the second option as well. This makes more sense to me than the first, and would probably be better at flagging the numeric characters. Thanks again!

Aaron Hall Over a year ago

I created a quick ref card based on the help with a small demo at the end. I should probably publish it. It basically lists the special characters, special sequences, module functions, flags, help on functions, (fairly complete descriptions) and a small Verbose example at the end of my own.

crmackey Over a year ago

Please do! I would like to see it; it would probably be easier to understand than the online help. Please post the location if you choose to publish.

Quentin Donnellan · Accepted Answer · 2014-01-30 16:31:49Z

Aaron's answer is good, just a little modification to match what it looks like you wanted (matched the data format specified)

import re

the_string = '<Layer DisplayName="Parcels (7-1-2010)" ... blablabla '
pattern = r'Parcels \(.*-.*-.*\)'
match = re.search(pattern, the_string)
print match.group()

Also, if you suspect the string may have more than 1 match, you could print all of the matches using the findall method. I've also used the \d+ regex, which matches only digits in the string

import re

the_string = '<Layer DisplayName="Parcels (7-1-2011)" ... blablabla ... Layer DisplayName="Parcels (7-1-2012)" '
pattern = r'Parcels \(\d+-\d+-\d+\)'
all_matches = re.findall(pattern, the_string)
for match in all_matches:
  print match

Collectives™ on Stack Overflow

Python look for pattern in string

2 Answers 2

4 Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

4 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related