1

I'm not the best at regular expressions and need some help.

I have these kind of strings: data-some-thing="5 10 red". Word 'data-some' is constant and 'thing' changes. 'thing' also may contain dashes. The values in double quotes contain only alphanumeric symbols or spaces.

Is it possible to get 'thing' and values in double quotes using only regex? If yes then what expression should I use? I tried using lookarounds but didn't have much success.

5
  • Obviously (?) this is a data attribute on an HTML element. Why are you trying to do anything with regexp on HTML? Instead, search through the attributes on the HTML element(s) (or on elt.dataset) for those of the right form, then you can retrieve the value of the attribute directly. Commented May 6, 2016 at 13:04
  • I would do that, but I'm parsing a string using node, not a document in a browser. Sorry for not being completely clear on that :) Commented May 6, 2016 at 13:10
  • 1
    This does not change the fact that you should not parse HTML with JS. If necessary, use a DOM package for node. Commented May 6, 2016 at 13:11
  • If I understand correctly I should always use a DOM parser for parsing HTML and regex is complete evil in this context? Commented May 6, 2016 at 13:24
  • Indeed that states it well. Commented May 6, 2016 at 13:40

1 Answer 1

3

You could use:

var result = data.match(/data-some-(.*?)="(.*?)"/);

The result array will have three elements:

  • 0: the complete match (not of your interest)
  • 1: the variable part before the equal sign
  • 2: the value between quotes.

Demo:

var data = 'data-some-thing="5 10 red"';
var result = data.match(/data-some-(.*?)="(.*?)"/);

document.write(result[1] + '<br>' + result[2]);

Disclaimer:

Please note that if you are doing this in the context of larger HTML parsing (it is not mentioned in the question), you should not use regular expressions. Instead you should load the HTML string into a DOM, and use DOM methods to find the attribute name and value pairs you are interested in.

For node.js you can use the npm modules jsdom and htmlparser to do this.

Sign up to request clarification or add additional context in comments.

4 Comments

I understand the dot and star but could you explain to me how the question mark works here? Thank you for the answer. Gonna mark it as soon as I can :)
The question mark affects the preceding star. It turns it from a greedy into a lazy star. Practically it means that as soon as the regex can go forward and match the pattern that follows it, it will do so (lazy). Without the question mark, the following data would be split in the wrong way: data-some-thing="5 10 red"; some other stuff="hallo".
As a trivial example of the futility of trying to parse HTML with regexp, this will fail if the attribute value is single-quoted. It will fail if there are spaces on either side of the equal sign. It will fail with input of the form xxx-data-some-thing. Etc. etc. Do not parse HTML with regexp.
@torazaburo, you are absolutely right, and if indeed the OP wants to use this in the context of larger HTML parsing (it is not mentioned in the question), it would be a bad idea. Loading it in a DOM would be the way to go then. But the OP spoke of some strings....

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.