0

I have data coming in to a python server via a socket. Within this data is the string '<port>80</port>' or which ever port is being used.

I wish to extract the port number into a variable. The data coming in is not XML, I just used the tag approach to identifying data for future XML use if needed. I do not wish to use an XML python library, but simply use something like regexp and strings.

What would you recommend is the best way to match and strip this data?

I am currently using this code with no luck:

p = re.compile('<port>\w</port>')
m = p.search(data)
print m

Thank you :)

2 Answers 2

1

Regex can't parse XML and shouldn't be used to parse fake XML. You should do one of

  • Use a serialization method that is nicer to work with to start with, such as JSON or an ini file with the ConfigParser module.
  • Really use XML and not something that just sort of looks like XML and really parse it with something like lxml.etree.
  • Just store the number in a file if this is the entirety of your configuration. This solution isn't really easier than just using JSON or something, but it's better than the current one.

Implementing a bad solution now for future needs that you have no way of defining or accurately predicting is always a bad approach. You will be kept busy enough trying to write and maintain software now that there is no good reason to try to satisfy unknown future needs. I have never seen a case where "I'll put this in for later" has led to less headache later on, especially when I put it in by doing something completely wrong. YAGNI!

As to what's wrong with your snippet other than using an entirely wrong approach, angled brackets have a meaning in regex.

Sign up to request clarification or add additional context in comments.

Comments

0

Though Mike Graham is correct, using regex for xml is not 'recommended', the following will work:

(I have defined searchType as 'd' for numerals)
searchStr = 'port'

if searchType == 'd':
    retPattern = '(<%s>)(\d+)(</%s>)'
else:
    retPattern = '(<%s>)(.+?)(</%s>)'

searchPattern = re.compile(retPattern % (searchStr, searchStr))
found = searchPattern.search(searchStr)
retVal = found.group(2)

(note the complete lack of error checking, that is left as an exercise for the user)

2 Comments

Whether this works depends on a gazillion things about the file.
not really, he owns all parts of this code, he can make it do whatever it wants. I understand the horror of not using an xml parser to handle something that 'looks like' xml. I also understand the OP's point of not wanting the expense of an xml engine for this one little bit. regex works just fine for his problem.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.