2

I am having trouble understanding the regular expressions module in Python. I think what I am trying to do is fairly simple, but I cannot figure it out.

I need to search through some xml files and find this pattern:

'DisplayName="Parcels (10-1-2012)"'

I can parse through the xml and make replacements no problem, the part I cannot figure out is how to do a wild card search to find any instance of "Parcels (some-date-year)". Since the date will vary, I need to find this pattern:

pat = '"Parcels (*-*-*)"' 

and I want to replace it with today's date which I can do with the time module. I copied out a line of one of the 80 or so xml docs where I would need to find the pattern.

According to the help for the re.search() function, it seems I can just put in a pattern, then the string I wish to search through. However, I am getting errors.

Help on function search in module re:

search(pattern, string, flags=0) Scan through string looking for a match to the pattern, returning a match object, or None if no match was found.

Here is my little test snippet:

import re
pat = '"Parcels (*-*-*)"'
t= '         <Layer DisplayName="Parcels (7-1-2010)" FeatureDescription="Owner Name: {OWNER_NAME}&lt;br/&gt;Property Address: {PROP_ADDR}&lt;br/&gt;Tax Name: {TAX_NAME}&lt;br/&gt;Tax Address 1: {TAX_ADD_L1}&lt;br/&gt;Tax Address 2: {TAX_ADD_L2}&lt;br/&gt;Land Use: {USE1_DESC}&lt;br/&gt;&lt;a href=&quot;http://www16.co.hennepin.mn.us/pins/pidresult.jsp?pid={PID_NO}&quot;&gt;View Property Information&lt;/a&gt;&lt;br/&gt;&lt;br/&gt;&lt;br/&gt;" FeatureLabel="Parcel ID: {PID_NO}" IconUri="{RestVirtualDirectoryUrl}/Images/Parcel.png" Identifiable="true" IncludeInLayerList="true" IncludeInLegend="true" Name="Parcels" Searchable="true" ShowMapTips="true" UnconfiguredFieldsSearchable="true" UnconfiguredFieldsVisible="true" Visible="true">'
match = re.search(pat, t)
print match

Most of the line is junk I don't need to worry about. I just need to see how I can find that date in the line so I can use just that piece in the replace() function. Does anyone know how I could find these dates? There may be other dates in the xml somewhere, but I don't need to replace these; just where it says "Parcels (some-date-year)". I appreciate any help! Thanks!

3
  • The pattern needs to be a regular expression. docs.python.org/2/howto/regex.html may help you? Commented Jan 30, 2014 at 16:11
  • 3
    Lots of XML modules, don't reinvent the wheel poorly. Commented Jan 30, 2014 at 16:12
  • Did you get an error from this? Is this relevant? stackoverflow.com/questions/3675144/… Commented Jan 30, 2014 at 16:15

2 Answers 2

1
import re

t= '         <Layer DisplayName="Parcels (7-1-2010)" FeatureDescription="Owner Name: {OWNER_NAME}&lt;br/&gt;Property Address: {PROP_ADDR}&lt;br/&gt;Tax Name: {TAX_NAME}&lt;br/&gt;Tax Address 1: {TAX_ADD_L1}&lt;br/&gt;Tax Address 2: {TAX_ADD_L2}&lt;br/&gt;Land Use: {USE1_DESC}&lt;br/&gt;&lt;a href=&quot;http://www16.co.hennepin.mn.us/pins/pidresult.jsp?pid={PID_NO}&quot;&gt;View Property Information&lt;/a&gt;&lt;br/&gt;&lt;br/&gt;&lt;br/&gt;" FeatureLabel="Parcel ID: {PID_NO}" IconUri="{RestVirtualDirectoryUrl}/Images/Parcel.png" Identifiable="true" IncludeInLayerList="true" IncludeInLegend="true" Name="Parcels" Searchable="true" ShowMapTips="true" UnconfiguredFieldsSearchable="true" UnconfiguredFieldsVisible="true" Visible="true">'

You need to escape the parens and then you can be more specific as to the contents, the generic character is ., and the * means 0 or more:

pat = '"Parcels \(.*\)"'
match = re.search(pat, t)
print(match.group())

Which prints:

"Parcels (7-1-2010)"

a more specific pattern would be:

pat = '"Parcels \([0-9]+-[0-9]+-[0-9]+\)"'
match = re.search(pat, t)
print(match.group())

Which prints:

"Parcels (7-1-2010)"

Here, the bracket contents ([0-9]) unitarily describe all the numbers from 0 to 9 (\d would be equivalent), the plus, +, following them means more than 0, and the dash means itself.

Sign up to request clarification or add additional context in comments.

4 Comments

Thanks Aaron! The re module's help is a little confusing. I need to do some more reading there. That did the trick though.
Thanks for the second option as well. This makes more sense to me than the first, and would probably be better at flagging the numeric characters. Thanks again!
I created a quick ref card based on the help with a small demo at the end. I should probably publish it. It basically lists the special characters, special sequences, module functions, flags, help on functions, (fairly complete descriptions) and a small Verbose example at the end of my own.
Please do! I would like to see it; it would probably be easier to understand than the online help. Please post the location if you choose to publish.
1

Aaron's answer is good, just a little modification to match what it looks like you wanted (matched the data format specified)

import re

the_string = '<Layer DisplayName="Parcels (7-1-2010)" ... blablabla '
pattern = r'Parcels \(.*-.*-.*\)'
match = re.search(pattern, the_string)
print match.group()

Also, if you suspect the string may have more than 1 match, you could print all of the matches using the findall method. I've also used the \d+ regex, which matches only digits in the string

import re

the_string = '<Layer DisplayName="Parcels (7-1-2011)" ... blablabla ... Layer DisplayName="Parcels (7-1-2012)" '
pattern = r'Parcels \(\d+-\d+-\d+\)'
all_matches = re.findall(pattern, the_string)
for match in all_matches:
  print match

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.