1

I am trying to code some stuff in HTML, CSS and Javascript. I have some problems with regex.

Let me take a simple example to explain my problem because I can't find the solution.

<script>
var str = "I am <b>a tennis player</b> but  I like also playing <i>football</i> and <i>rugby</i>, I am  <b>34</b> years old, I like <u>cooking</u> even if there is nothing in common with <i>tennis</i>, <i>football</i> or <i>rugby</i>.";

var result = str.match(/<b>(.*?)<\/b>/g).map(function(val){
   return val.replace(/<\/?b>/g,'');
});

alert(result)

</script>

So as you may have guessed it, I am looking for selecting all the text between the tags <b></b>,<i></i>,<u></u>. To be clearer I want to be able to select "a tennis player", "football", "rubgy", "34", "cooking" etc.

For the moment, I managed to deal with only one tag. When I try with several ones I fail. I have no experience on regex (I didn't study and work in this field) and the courses I found on the internet didn't answer my question. I don't think it is difficult to combine three regex, but I am lost with clast, with AND or OR etc. :/

3
  • When you parse HTML, use a HTML parser. Commented Jan 11, 2016 at 14:46
  • @Tushar Not yet :s I need to learn what's jQuery is precisely (I heard sth about it but not with an accurate view) Commented Jan 11, 2016 at 14:51
  • @stribizhev oh I didn't find know this then I will google it :D Commented Jan 11, 2016 at 14:51

3 Answers 3

2

Getting all text from u, b and i tags can be easily achieved with plain JS DOM parser:

function getTagTexts(str, tag) {
  var el = document.createElement( 'html' ); // create an empty element
  el.innerHTML = '<faketag>' + str + '</faketag>';  // init the innerHTML property of the element
  var arr = [];  // declare the array for the results
  [].forEach.call(el.getElementsByTagName(tag), function(v,i,a) { // iterate through the tags we want
      arr.push(v.innerText);  // and add the innerText property to the array
  });
  return arr;
}

var txt = "I am <b>a tennis player</b> but  I like also playing <i>football</i> and <i>rugby</i>, I am  <b>34</b> years old, I like <u>cooking</u> even if there is nothing in common with <i>tennis</i>, <i>football</i> or <i>rugby</i>.";

var arrayI = getTagTexts(txt, "i");
var arrayU = getTagTexts(txt, "u");
var arrayB = getTagTexts(txt, "b");
document.body.innerHTML += JSON.stringify(arrayI, 0, 4) + "<br/>"; // => ["football", "rugby", "tennis", "football", "rugby"]
document.body.innerHTML += JSON.stringify(arrayU, 0, 4) + "<br/>"; // => ["cooking"]
document.body.innerHTML += JSON.stringify(arrayB, 0, 4); // => ["a tennis player", "34"]

Note that the faketag is necessary if you need to parse an HTML fragment without html/body tags.

Sign up to request clarification or add additional context in comments.

2 Comments

Basically, this is same as my answer using jQuery. +1 for plain JS solution.
What is funniest, I even did not see your answer when writing mine and that comment.
2

You can use following regex to extract the innerText of elements.

/<([biu])>(.*?)<\/\1>/gi

Explanation:

  1. <([biu])>: Matches < followed by either b/i/u and then >. Can also be written as <(b|i|u)> and puts the tagName in the first captured group.
  2. (.*?): Non-greedy match. Matches as many as possible characters to satisfy the condition
  3. <\/\1>: Matches the </ followed by the first captured group(see #1 above) followed by >. Thus matching the closing tag.
  4. gi: g: Global flag to match all possible results. i: Case-insensitive match.

var str = "I am <b>a tennis player</b> but  I like also playing <i>football</i> and <i>rugby</i>, I am  <b>34</b> years old, I like <u>cooking</u> even if there is nothing in common with <i>tennis</i>, <i>football</i> or <i>rugby</i>.";

var regex = /<([biu])>(.*?)<\/\1>/gi,
    result = [];

while (match = regex.exec(str)) {
    result.push(match[2]);
}

console.log(result);
document.body.innerHTML = '<pre>' + JSON.stringify(result, 0, 4) + '</pre>';


You can also use jQuery.

var str = "I am <b>a tennis player</b> but  I like also playing <i>football</i> and <i>rugby</i>, I am  <b>34</b> years old, I like <u>cooking</u> even if there is nothing in common with <i>tennis</i>, <i>football</i> or <i>rugby</i>.";

var result = [];

$('<div/>').html(str).find('b, i, u').each(function(i, e) {
    result.push(e.innerText);
});
console.log(result);
$('body').html('<pre>' + JSON.stringify(result, 0, 4) + '</pre>');
<script src="https://ajax.googleapis.com/ajax/libs/jquery/2.0.0/jquery.min.js"></script>

2 Comments

You always have great demos :)
@stribizhev Thanks! It's your comment on other answer, that make me undelete this answer :)
-1

See code below:

var str = "I am <b>a tennis player</b> but  I like also playing <i>football</i> and <i>rugby</i>, I am  <b>34</b> years old, I like <u>cooking</u> even if there is nothing in common with <i>tennis</i>, <i>football</i> or <i>rugby</i>.";

var result = str.match(/<(b|i|u)>(.*?)<\/\1>/g).map(function(val){
   return val.replace(/<\/?b>|<\/?i>|<\/?u>/g,'');
});

alert(result)

8 Comments

I think you 'd better write it as <([biu])>(.*?)<\/\1>. Safer. Or if you want to preserve the structure for multicharacter tags, use <(b|i|u)>(.*?)<\/\1>. Well, why replace inside a match at all? Just use RegExp#exec that keeps all submatches.
@stribizhev thanks for your suggestions. First part is clear for me and i've updated code...but not sure that i understand second suggestion with RegExp#exec (
Tushar has undeleted his answer, have a look what I meant.
@AndriyIvaneyko Hi :D it's me again ! So it helped me a lot to automatize some parts I do every day but it works only for strings I think as when I try to apply the same function to a textArea for instance (a small article I wrote for which I added so tags) nothing appends .... do you have a tips to solve it ? :D (by text I mean : an article I wrote and that I copy and paste in a textArea of a small simple editor I coded)
@AndriyIvaneyko humm in fact when the whole text is in one line it works I have the feeling ... then maybe I just need to make the white space become inactive no ?
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.