0

I have a nodeJS script that reads HTML from a file as string. I would like to extract some data from it. My string (it is a string not HTML) is as following:

<tr><td style="text-align: center;">Initial Filing</td></tr>
                                        
<tr><td>Debtor</td></tr>

    <tr><td class="dName">PO</td></tr>
    <tr><td class="dAddress">CLACKAMAS OR 97015</td></tr>

<tr><td>Secured Party</td></tr>

    <tr><td class="spName">AS</td></tr>
    <tr><td class="spAddress">SPRINGFIELD IL 62708</td></tr>
    
<tr><td>Debtor</td></tr>
    <tr><td class="dName">ONE</td></tr>
    <tr><td class="dAddress">CLACKAMAS OR 97015</td></tr>

<tr><td>Secured Party</td></tr>

    <tr><td class="spName">ANY</td></tr>
    <tr><td class="spAddress">SPRINGFIELD IL 62708</td></tr>

The JavaScrit code I'm using is:

fs.readFile('file.txt', 'utf8', function (err, data) {
        if (err) {
            console.log("Error reading file.txt", err);
            process.exit(1);
        }
        var cleanedHtml = /<tr><td>Debtor<\/td><\/tr>(.*?)<tr><td>Secured Party<\/td><\/tr>/g.exec(html);
        console.log(cleanedHtml[1]);
    });

It returns to me this:

 return cleanedHtml[1];
                      ^
TypeError: Cannot read property '1' of null

Is there any issue with my regex? Also, how can I have an end result like this:

PO
CLACKAMAS OR 97015

AS
SPRINGFIELD IL 62708
    
ONE
CLACKAMAS OR 97015

ANY
SPRINGFIELD IL 62708

Thanks.

3
  • You probably need an HTML parser rather than using regex. Commented Oct 31, 2021 at 9:39
  • I'd suggest two things to check: 1) in your regex instead of (.*?) try (.*)? 2) in the exec method, pass in data and not html (where is html defined?!). Commented Oct 31, 2021 at 10:23
  • html is also just an error when I was to simplify the code I use to post it here, but I'm passing the correct variable. Commented Oct 31, 2021 at 10:46

2 Answers 2

3

If you make sure that the tr elements are inside <table></table> then you can parse the string using DOMParser() after reading the file:

Demo:

var strHtml = `
  <table>
    <tr><td style="text-align: center;">Initial Filing</td></tr>

    <tr><td>Debtor</td></tr>

    <tr><td class="dName">PO</td></tr>
    <tr><td class="dAddress">CLACKAMAS OR 97015</td></tr>

    <tr><td>Secured Party</td></tr>

    <tr><td class="spName">AS</td></tr>
    <tr><td class="spAddress">SPRINGFIELD IL 62708</td></tr>

    <tr><td>Debtor</td></tr>
    <tr><td class="dName">ONE</td></tr>
    <tr><td class="dAddress">CLACKAMAS OR 97015</td></tr>

    <tr><td>Secured Party</td></tr>

    <tr><td class="spName">ANY</td></tr>
    <tr><td class="spAddress">SPRINGFIELD IL 62708</td></tr>
  </table>
  `

var doc = new DOMParser().parseFromString(strHtml, 'text/html');
var els = doc.querySelectorAll('.dName,.spName,.dAddress,.spAddress');
els.forEach((el) => {
  console.log(el.textContent);
});

Sign up to request clarification or add additional context in comments.

2 Comments

This is a script that runs by nodeJS. It says var doc = new DOMParser().parseFromString("<table>" + strHtml + "</table>", 'text/html'); ^ ReferenceError: DOMParser is not defined
0

Should there not be brackets after console.log? Is the cleanedHtml a list with more than one element? Otherwise there is no cleanedHtml[1]

1 Comment

Sorry that is just a typo, the real code is return cleanedHtml[1]; I updated the code above.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.