0

I cant seem to be able to build a good regex expression (in javascript) that extracts each attribute from an xml node. For example,

<Node attribute="one" attribute2="two" n="nth"></node>

I need an express to give me an array of

['attribute="one"', 'attribute2="two"' ,'n="nth"']

... Any help would be appreciated. Thank you

4
  • 4
    Time for the obligatory link. Commented Jul 25, 2011 at 2:46
  • Why wouldn't you just use an XML parser library? Commented Jul 25, 2011 at 2:54
  • 1
    @jfriend00 - probably because browsers have a built–in XML parser and suitable DOM methods already. Commented Jul 25, 2011 at 3:13
  • I'm not sure i want the overhead of an xml parser library, plus i'm rarely ever going to have well formed xml. im actual parsing the diff generated by git. Commented Jul 26, 2011 at 1:30

4 Answers 4

4

In case you missed Kerrek's comment:

you can't parse XML with a regular expression.

And the link: RegEx match open tags except XHTML self-contained tags

You can get the attributes of a node by iterating over its attributes property:

function getAttributes(el) {
  var r = [];
  var a, atts = el.attributes;

  for (var i=0, iLen=atts.length; i<iLen; i++) {
    a = atts[i];
    r.push(a.name + ': ' + a.value);
  }
  alert(r.join('\n'));
}

Of course you probably want to do somethig other than just put them in an alert.

Here is an article on MDN that includes links to relevant standards:

https://developer.mozilla.org/En/DOM/Node.attributes

Sign up to request clarification or add additional context in comments.

1 Comment

I'd definitely use this instead of a regex if possible +1
3

try this~

  <script type="text/javascript">
    var myregexp = /<node((\s+\w+=\"[^\"]+\")+)><\/node>/im;
    var match = myregexp.exec("<Node attribute=\"one\" attribute2=\"two\" n=\"nth\"></node>");
    if (match != null) {
    result = match[1].trim();
    var arrayAttrs = result.split(/\s+/);
    alert(arrayAttrs);}
  </script>

1 Comment

I got about this far as well. unfortunately, a space in the attribute value breaks this. Perhaps I need to first replace spaces in between "" with an underscore, then after i split the array, return back to spaces?
0

I think you could get it using the following. You would want the second and third matching group.

<[\w\d\-_]+\s+(([\w\d\-_]+)="(.*?)")*>

1 Comment

That won't work in a number of cases, such as if there's a namespace, e.g. <ns1:tagname .... >, or an attribute name contains a colon (:) or a period (.) character (not included in the appropriate part of the regular expression) or the value contains a double quote character.
0

The regex is /\w+=".+"/g (note the g of global).

You might try it right now on your firebug / chrome console by doing:

var matches = '<Node attribute="one" attribute2="two" n="nth"></node>'.match(/\w+="\w+"/g)

6 Comments

And if an attribute value has a space it fails. See the link in the first comment.
No, it doesn't. The question was "a good regex expression ... that extracts each attribute from an xml node", not one for the very limited example.
@Pablo, maybe you should try it before saying that you fixed anything. ;-)
@Qtax, oops, left the escape there. Thanks for the correction man :)
@Pablo, you should still try it, even on your limited example. ;-)
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.