2

I am officially declaring myself as dumb !!! I'm quite good with regex but the javascript regex is getting on my nerves:

I have following html string:

htmlString = '<div class="aa">TextOne</div><ul><li>one</li></ul>';

I need to get all that is inside the UL element based on the text that is inside the aa class div.

I tried the following:

textItem = 'TextOne';

ulRegex = new RegExp('<div class="aa">'+textItem+'</div><ul>(.*)</ul>', "igm");
ul = ulRegex.exec(htmlString);

While writing this question i discovered an error (one tiny extra character) in my regex that didn't let it match but for all those looking for something specific - javascript / regular expression / html string / html substring - its working fine.

Edited

I'm thankful for all the additions to this - but there is one additional aspect i'm using regex - being that i am matching a text item which i am getting through a variable first for the regex pattern.

Solution

Having received a few hints and suggestions i came up with the following which may help someone else as well:

htmlString = '<div class="aa">TextOne</div><ul><li>one</li></ul>';

textItem = 'TextOne';

tempdiv = $('<div/>'); 
tempdiv.html(htmlString);
ul = tempdiv.find('div.aa:contains('+textItem+')').next('ul');

$('#res').append(ul);

http://jsfiddle.net/sdXpJ/

The next ul is important because that solves the issue regarding nested ULs and any other regex based solution where i couldn't match a first level UL (having internal one or more Uls).

10
  • 2
    Don't parse HTML with Regex, you might be able to make it work but don't go that route. Use an XML/HTML parser instead. Commented Oct 28, 2013 at 14:09
  • 2
    I'm confused a little bit. It seems you have a "working" regex. One small detail, use .*? instead of .* to match ungreedy. There is no need to use the m modifier. Also don't forget to add some \s* for sake of completeness Commented Oct 28, 2013 at 14:13
  • 1
    @iambriansreed You mean $('div.aa').next('ul').html() Commented Oct 28, 2013 at 14:15
  • 1
    Let's listen to the smart guys here codinghorror.com/blog/2009/11/parsing-html-the-cthulhu-way.html and not use regex to parse HTML. Use something smarter, like your browser, which was built to parse HTML - i.e. use DocumentFragment. Or hell, jQuery would do a great job too. Commented Oct 28, 2013 at 14:29
  • 1
    @owsata if you have nested ul's then just forget about using regex, you'll need recursive patterns which is only available in PHP, Perl, .NET and maybe other few languages I don't know of but they are very few. Check this awesome answer. Commented Oct 28, 2013 at 15:19

2 Answers 2

1

Solution

Having received a few hints and suggestions i came up with the following which may help someone else as well:

htmlString = '<div class="aa">TextOne</div><ul><li>one</li></ul>';

textItem = 'TextOne';

tempdiv = $('<div/>'); 
tempdiv.html(htmlString);
ul = tempdiv.find('div.aa:contains('+textItem+')').next('ul');

$('#res').append(ul);

http://jsfiddle.net/sdXpJ/

The "next ul" is important because that solves the issue regarding nested ULs and any other regex based solution where i couldn't match a first level UL (having internal one or more Uls).

Sign up to request clarification or add additional context in comments.

Comments

0

You can use a simple indexOf method for this:

function str_between(str, searchStart, searchEnd, caseSensitive, offset) {
    var fullString = str;

    caseSensitive = caseSensitive || false;
    offset = offset || 0;

    if (!caseSensitive) {
        fullString = fullString.toLowerCase();
        searchStart = searchStart.toLowerCase();
        searchEnd = searchEnd.toLowerCase();
    }

    var startPosition = fullString.indexOf(searchStart, offset);
    if (startPosition > -1) {
        var endPosition = fullString.indexOf(searchEnd, startPosition + 1);
        if (endPosition > -1) {
            return str.substr(startPosition + searchStart.length, endPosition - startPosition - searchEnd.length + 1);
        }
    }
    return false;
}

> var htmlString = '<div class="aa">TextOne</div><ul><li>one</li></ul>';

> str_between(htmlString, '<ul>', '</ul>');
"<li>one</li>"

> str_between(htmlString, '<UL>', '</UL>');
"<li>one</li>"

> str_between(htmlString, '<UL>', '</UL>', true);
false

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.