how to remove HTML tags from a string in JavaScript without using regexp?

Question

I am new to programming and I was solving this exercise. I have tried 3 loops with string.slice() but for some reason it prints an empty string.

Would you please explain what happens inside my code and why it prints the wrong output and how I can correct, rather than giving me your version of the correct answer, so that I can learn from my mistakes.

the test input is

<p><strong><em>PHP Exercises</em></strong></p>

and output should be PHP Exercises

p.s this is not a PHP exercise, I'm not confused

here is my code :

function remove(answer){ 

    var sen = answer.split("");
    var arr = [];
    for (var i = 0; i<answer.length; i++){
        if (answer[i] == "<"){
            for (var j = i; j<answer.length; j++){
                if (answer[j] == ">"){
                    for (var k = j; k<answer.length; k++){
                        if (answer[k] == "<"){
                            return answer.slice(j+1, k);                
                        }
                    }
                }
            }
        }
    }
}

Well, this is not going to be easy and very likely very messy and buggy in the end. Why no regex? It would make your life so much easier... — Lucero
– Lucero, Commented Mar 17, 2016 at 2:12
@Lucero No, regexp would not make his/her life easier. Parsing HTML with regexp is broken. — Mulan
– Mulan, Commented Mar 17, 2016 at 2:12
I haven't learnt RegEX yet, scheduled for next week, I just thought I'd challenge myself (enthusiasm of newbies :D) — Mohamed Hegazy
– Mohamed Hegazy, Commented Mar 17, 2016 at 2:13
It's looking for a <, then a > then another < ... which is satisfied by the first four characters of your string ("<p><"). It then returns everything between the > and the 2nd <, which is a zero length string. — Tibrogargan
– Tibrogargan, Commented Mar 17, 2016 at 2:14
@naomik Parsing with regex only is broken, removing tags not... Because using regex to identify single tags is completely fine, just not matching tag pairs in one regex due to nesting. — Lucero
– Lucero, Commented Mar 17, 2016 at 2:17

num8er · Accepted Answer · 2016-03-17 02:35:17Z

5

Try this:

function stripTags(data)
{
   var tmpElement = document.createElement("div");
   tmpElement.innerHTML = data;
   return tmpElement.textContent || tmpElement.innerText || "";
}

var something = '<p><strong><em>PHP Exercises</em></strong></p>';
alert(stripTags(something));

or You can use string.js (string.js link):

var S = window.S;
var something = '<p><strong><em>PHP Exercises</em></strong></p>';
something = S(something).stripTags().s;
alert(something);

<script src="https://raw.githubusercontent.com/jprichardson/string.js/master/dist/string.min.js"></script>

if You're trying nodejs so:

var S = require('string');
var something = '<p><strong><em>PHP Exercises</em></strong></p>';
something = S(something).stripTags().s;
console.log(something);

edited Mar 17, 2016 at 2:35

answered Mar 17, 2016 at 2:14

num8er

19.5k4 gold badges50 silver badges61 bronze badges

Sign up to request clarification or add additional context in comments.

11 Comments

Tibrogargan Over a year ago

The assumption that it's running in a DOM Document may be flawed, but it's good anyway :)

Mulan Over a year ago

A+ nice answer. I was unaware that .textContent would work even for nested children. I was in the middle of writing up an answer that recursively reduced children using .textContent on the entire tree. This is obviously much simpler.

EasyBB Over a year ago

It's the only way to do it properly. This is the correct answer so long as the html is correctly formatted.

Tibrogargan Over a year ago

@num8er he never mentions client side anything, just Javascript. He could be working in any number of Javascript engines (but almost certainly isn't, lol)

Lucero Over a year ago

@MohamedHegazy This solution is basically leveraging the (assumed) browser's functionality for converting a string into a document object model (DOM), which has properties (like innerText) exposing the data in an easy-to use format. This doesn't teach you anything about parsing text though, but it would be a safe and clean solution for solving a real-world problem.

|

T. Silver · Accepted Answer · 2016-03-17 02:21:37Z

1

As to why the provided code isn't working, the code returns when j = 2 and k = 3. I discovered this by writing console.log(j, k); immediately before the return. This insight made it clear that the code is identifying the first set of open tags, when actually you seem to want to identify the open and closed "em" tags. The answers provided by others are more robust, but a quick fix to your code is:

change

if (answer[i] == "<"){

to

if (answer.slice(i, i+3) == "<em"){

Hope this helps!

answered Mar 17, 2016 at 2:21

T. Silver

3721 silver badge6 bronze badges

Comments

Dexter · Accepted Answer · 2016-03-17 02:34:08Z

-2

Your code does not account for ... nothing. It simply stops at the first encounter of what's between ">" and "<", which is, in the first case, is nothing! You should check if a character is present, and move on if not.

Honestly, this is one of those useless exercises that text books use to try to get you to think outside the box. But you will never want to loop through a string to find text between tags. There are so many methods built in to JavaScript, it's literally reinventing the wheel to do this... that is if a wheel were really a for-loop.

If you really want to avoid Regex and other built in functions so that you can learn to problem solve the long way, well try slicing by brackets first!

answered Mar 17, 2016 at 2:34

Dexter

8153 gold badges15 silver badges27 bronze badges

2 Comments

Mohamed Hegazy Over a year ago

I figure you're one of them anti-textbook programmers lol, nothing wrong with that though, I was actually solving w3resource exercise as I'm also anti-textbook exercises :) thanks for your advice mate

Dexter Over a year ago

I'm definitely not anti-textbook haha I'm just anti-useless-workarounds... Programming should be learned just like math-- learning simpler techniques and building on those to more complex ones. You should never have to learn to do something in a way that is impractical if you will soon replace it with a more practical way anyway. It's like learning to change a tire by using a stick and a log because you haven't learned to use a jack.

Collectives™ on Stack Overflow

how to remove HTML tags from a string in JavaScript without using regexp?

3 Answers 3

11 Comments

Comments

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

11 Comments

Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related