1

I have a regex that removes xmlns references from XML. It works fine when there are matching tags, but if the the xmlns reference is in a single tag it removes "/" as well.

Here is the regex:

"<(.*?) xmlns[:=].*?>", "<$1>"

When I use the regex on this line of xml:

<ns22:someTagName xmlns:ns22="http://exampledatatypes.com"></ns22:someTagName>

I get what I want:

<ns22:someTagName></ns22:someTagName>

When I use the regex on this line of xml:

<ns22:someTagName xmlns:ns22="http://exampledatatypes.com"/>

I get this invalid XML:

<ns22:someTagName>

It removes the reference fine, but it takes the closing "/" with it.

Thanks for the help, Scott

1
  • 4
    Don't use regex for XML. What programming language are you using? Undoubtedly there is a superior XML API that would allow you remove namespaces easily. Commented Feb 24, 2011 at 16:28

3 Answers 3

7

Rather than trying to preserve what you need from the XML it would be better to target what you want to remove.

This expression targets just the namespace itself:

\sxmlns[^"]+"[^"]+"

Unfortunately I don't know LotusScript so I can't give you a code sample of how to use this but what you need to do is something like this psuedocode:

result = regex.replace(yourString, '\sxmlns[^"]+"[^"]+"', '')

What you will do here is replace all matches with an empty string (effectively removing them). This will work for both a closed and self-closed XML tag and it will also work if the tag doens't have a namespace at all.

Edit: Here is a fully-functional Python example:

>>> from re import sub
>>> pattern = r'\sxmlns[^"]+"[^"]+"'
>>> closed = r'<ns22:someTagName xmlns:ns22="http://exampledatatypes.com"></ns22:someTagName>'
>>> sub(pattern, '', closed)
'<ns22:someTagName></ns22:someTagName>'
>>> selfclosed = r'<ns22:someTagName xmlns:ns22="http://exampledatatypes.com"/>'
>>> sub(pattern, '', selfclosed)
'<ns22:someTagName/>'
Sign up to request clarification or add additional context in comments.

4 Comments

Hmmm. I tried it and it didn't seen to do anything. In LS you need to escape " with another ". Here is what i tried: ExecuteReplace(sXML, "xmlns[^""]+""[^""]+""", "")
Hi Andrew, I got it to work, but it leaves whitespace in the tag where the reference is removed. Is there a way to clear the whitespace out? ExecuteReplace(sXML, "xmlns[^""]+""[^""]+""", "")
@Scott - I changed the expression to this: \sxmlns[^"]+"[^"]+" to handle the whitespace issue.
I added a space in front of xmlns and it fixed it: ExecuteReplace(sXML, " xmlns[^""]+""[^""]+""", ""). Thanks for the help.
1

Don't use regex on XML if you have access to an XML parser! That being said, I don't know anything about LotusScript's XML parsing capabilities (if it even has them), so if you must use regex, this will get you closer:

<([^>]*?)\bxmlns\b[^"']+('|").*?$2(.*?/?>)

to be replaced with:

<$1$3

The most important change here from your original regex is the /? toward the end. BTW, I haven't escaped the qoutes or backslashes since I don't know LotusScript syntax for that, and I assume you do.

There will always be XML-valid input that cannot be properly understood by this, due to the limitations of regex. However, it should work for most cases. You could double-check manually by searching for the string "xmlns" afterward.

Comments

0

regex \s*xmlns(:\w+)?="[^"]*" can remove both implicit / named xmlns.

In Java, xmlString.replaceFirst("\\s*xmlns(:\\w+)?=\"[^\"]*\"", "")

https://regexr.com/ is a great tool to use for writing/testing these.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.