0

I'm working on a script and need to split strings which contain both html tags and text. I'm trying to isolate the text and elimanate the tags

For example, I want this:

string = '<p><span style="color:#ff3366;">A</span></p><p><span style="color:#ff3366;text-decoration:underline;">B</span></p><p><span style="color:#ff3366;text-decoration:underline;"><em>C</em></span></p>';

to be split like this:

separation = string.split(/some RegExp/);

and become:

separation[0] = "<span style="color:#ff3366;">A</span>";
separation[1] = "<span style="color:#ff3366;text-decoration:underline;">B</span>";
separation[2] = "<span style="color:#ff3366;text-decoration:underline;"><em>C</em></span>";

After that I would like to split the sepeartion string like this:

stringNew = '<span style="color:#ff3366;">A</span>';

extendedSeperation = stringNew.split(/some RegExp/);

extendedSeperation[0] = "A";
extendedSeperation[1] = "style="color:#ff3366;";
6
  • Why not just use the parser that you have in the browser ? Everything would be trivial and correct. Commented Jun 3, 2015 at 7:58
  • 2
    Well, even I call it HTML parsing. Do not use any regex, check Parse a HTML String with JS. Commented Jun 3, 2015 at 7:59
  • Don't use regex parsing for html, it is messy Commented Jun 3, 2015 at 8:00
  • 1
    stackoverflow.com/a/1732454/2331182 theres already an answer for this Commented Jun 3, 2015 at 8:05
  • 1
    @BurningCrystals: Don't close dup as that question. That question doesn't contain any solution for the problem. Commented Jun 3, 2015 at 8:50

2 Answers 2

1

Don't use RegEx for reasons explained in comments.

Instead, do this:

Create an invisible node:

node = $("<div>").css("display", "none");

Attach it to the body:

$("body").append(node);

Now inject your HTML into the node:

node.html(myHTMLString);

Now you can traverse the DOM tree and extract/render it as you like, much like this:

ptags = node.find("p") // will return all <p> tags

To get the content of a tag use:

ptags[0].html()

Finally, to clear the node do:

node.html("");

This should be enough to get you going.

This way you leverage the internal parser of the browser, as suggested in the comments.

Sign up to request clarification or add additional context in comments.

Comments

0

Your exact expectations are a little unclear, but based only on the information given here is an example that may give you ideas.

Does not use RegExp

Does not use jQuery or any other library

Does not append and remove elements from the DOM

Is well supported across browsers

function walkTheDOM(node, func) {
    func(node);
    node = node.firstChild;
    while (node) {
        walkTheDOM(node, func);
        node = node.nextSibling;
    }
}

function textContent(node) {
    if (typeof node.textContent !== "undefined" && node.textContent !== null) {
        return node.textContent;
    }

    var text = ""

    walkTheDOM(node, function (current) {
        if (current.nodeType === 3) {
            text += current.nodeValue;
        }
    });

    return text;
}

function dominate(text) {
    var container = document.createElement('div');

    container.innerHTML = text;

    return container;
}

function toSeparation(htmlText) {
    var spans = dominate(htmlText).getElementsByTagName('span'),
        length = spans.length,
        result = [],
        index;

    for (index = 0; index < length; index += 1) {
        result.push(spans[index].outerHTML);
    }

    return result;
}

function toExtendedSeperation(node) {
    var child = dominate(node).firstChild,
        attributes = child.attributes,
        length = attributes.length,
        text = textContent(child),
        result = [],
        style,
        index,
        attr;

    if (text) {
        result.push(text);
    }

    for (index = 0; index < length; index += 1) {
        attr = attributes[index]
        if (attr.name === 'style') {
            result.push(attr.name + '=' + attr.value);

            break;
        }
    }

    return result;
}

var strHTML = '<p><span style="color:#ff3366;">A</span></p><p><span style="color:#ff3366;text-decoration:underline;">B</span></p><p><span style="color:#ff3366;text-decoration:underline;"><em>C</em></span></p>',
    separation = toSeparation(strHTML),
    extendedSeperation = toExtendedSeperation(separation[0]),
    pre = document.getElementById('out');

pre.appendChild(document.createTextNode(JSON.stringify(separation, null, 2)));
pre.appendChild(document.createTextNode('\n\n'));
pre.appendChild(document.createTextNode(JSON.stringify(extendedSeperation, null, 2)));
<pre id="out"></pre>

Of course you will need to make modifications to suit your exact needs.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.