1

I'm getting an HTML I need to parse it so that I can read text under a certain Heading. More specifically, there is a div tag that includes several H2 elements and I need to read only the text between the 3rd and 4th H2 heading, i.e. the Summary section.

<div>
    <h2>Risks</h2>
    <p>Lorem ipsum dolor sit amet, consectetuer adipiscing elit.</p>
    <p>Donec fermentum orci nec felis.</p>
    <h2>Affected Systems</h2>
    <p>Sed sollicitudin diam id sapien.</p>
    <p>Ut libero.</p>
    <h2>Summary</h2>
    <!-- from here -->
    <p>Vestibulum quam libero, malesuada et, ornare id, aliquet id, tellus.</p>
    <p>Nullam dapibus viverra quam.</p>
    <p>Vestibulum sit amet nunc vel justo dictum pharetra.</p>
    <!-- through here -->
    <h2>Avoidance</h2>
    <p>Proin eleifend mi eget massa.</p>
    <p>Pellentesque feugiat sapien a ante.</p>
</div>
2

2 Answers 2

0

Good question. You can use a recursive function well for that. The function gets the start point (third h2) and the end point (fourth h2). Then you iterate over every single element within these two points. I have now written the output to the console. But you can concatenate it into a string.

function getTextFromTo(rootNode, startNode, endNode) {
    let pastStartNode = false, reachedEndNode = false, textNodes = [];

    function getTextNodes(node) {
        if (node == startNode) {
            pastStartNode = true;
        } else if (node == endNode) {
            reachedEndNode = true;
        } else if (node.nodeType == 3) {
            if (pastStartNode && !reachedEndNode && !/^\s*$/.test(node.nodeValue)) {
                textNodes.push(node);
            }
        } else {
            for (var i = 0, len = node.childNodes.length; !reachedEndNode && i < len; ++i) {
                getTextNodes(node.childNodes[i]);
            }
        }
    }

    getTextNodes(rootNode);
    return textNodes;
}


const from = document.querySelector('div :nth-child(5)'); // from
const to = document.querySelector('div :nth-child(11)'); // to
const root = document.querySelector('div'); 

var textNodes = getTextFromTo(root, from, to);


for (let i = 0, len = textNodes.length, div; i < len; ++i) {
    console.log(textNodes[i].data)
}
<div class="col-md-12">
  <h2>title 1</h2>
  <ul><li></li></ul>
    
  <h2>title 2</h2>
  <ul><li></li></ul>
  <h2>Resume</h2>
  <p>text 1</p>
  <p>text 2</p>
  <p>text 3 this one</p>
  <p>text 4</p>
  <p>text 5 this one</p>
  <h2>next title</h2>
</div>

The originator of this cool function is @TimDown. I just adapted it. How can I find all text nodes between two element nodes with JavaScript/jQuery?

Sign up to request clarification or add additional context in comments.

Comments

-1

You can use regex for it

/(?<=<h2>Résumé<\/h2>)(.|\n)*?(?=<h2>)/g

This will get all the text after <h2>Résumé<\/h2>' till next <h2> tag.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.