0

im trying to split a string

<div id = 'tostart'><button>todo </button>hometown todo </div>

with "to" as a keyword.

the problem is i do not have to split in between the tags and have to only split from outside the tags so if i split i get a result like

    arr = ["<div id = 'tostart'><button>","do","</button>home","wn ","do </div>"]

is there a regex by using which it can be acheived.

Thanks in advance.

3
  • 1
    Regex is not going to be your friend here. You will need to parse it yourself and decide on your own rules as to what qualifies as being within a tag or not Commented May 22, 2014 at 9:42
  • Whatever you're trying to do. Dismiss the idea that you should solve it with Regex. Now... Commented May 22, 2014 at 9:42
  • i acheived this with a basic regex for splitting it with a space excluding the tags which was var splitText = text.match(/[\<].+?[\>]+|[^\s]+/g); but this splits from spaces and i need to split with a word Commented May 22, 2014 at 9:56

3 Answers 3

1

use this :

var str = "<div id = 'tostart'><button>todo </button>hometown todo </div>";
var res = str.replace(/to/g, '|').replace(/(.*?)(<.*?)\|(.*?>)/g, '$1$2to$3');
console.log(res.split("\|"));

output :

["<div id = 'tostart'><button>", "do </button>home", "wn ", "do </div>"]

@musefan:

This is actually done as an improvisation .

first I replaced all the to with | and then I selected all the pipes which were inside the < or > and replaced them with to. Finally I could split on the basis of the | which were left over by the previous replace.

regex : (.*?)(<.*?)\|(.*?>)

will select all | characters which are inside < and >

Sign up to request clarification or add additional context in comments.

1 Comment

Very nice! Wish I understood what that regex is doing though!
0

I am relying on your HTML using &lt; and &gt; to escape stray < > that browsers tolerate but validators don't!

str.split(/to(?=[^>]*(?=<|$))/g);

As others have said, regex isn't going to work for really messy HTML (e.g. inline script elements).

Comments

0

This is a very quick and dirty function that will do what you want. Note that there is probably a more efficient way to do this, and also that it does not cater for any > characters that might be part of an attribute value. However, it does work for your example input:

function splitNonTag(input, splitText) {
    var inTag = false;//flag to check if we are in a tag or not
    var result = [];//array for storing results
    var temp = "";//string to store current result

    for (var i = 0; i < input.length; i++) {
        var c = input[i];//get the current character to process

        //check if we are not in a tag and have found a split match
        if (!inTag && input.substring(i).indexOf(splitText) == 0) {
            result.push(temp);//add the split data to the results
            temp = "";//clear the buffer ready for next set of split data
            i += splitText.length - 1;//skip to the end of the split delimiter as we don't keep this data
            continue;//continue directly to next iteration of for loop
        }

        temp += c;//append current character to buffer as this is part of the split data

        //check if we are entering, or exiting a tag and set the flag as needed
        if (c == '<') inTag = true;
        else if (c == '>') inTag = false;
    }
    //if we have any left over buffer data then this should become the last split result item
    if (temp) 
        result.push(temp);

    return result;
}
var input = "<div id = 'tostart'><button>todo </button>hometown todo </div>";
var result = splitNonTag(input, 'to');
console.log(result);

Here is a working example

2 Comments

this leaves the ending div for the string but a really good function
@BackStabber: Sorry, bit of a typo, fixed now

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.