0

For a project to make communications clearer for a website, I have to pull the messages using regex (Why? Because the message is commented out. With normal document.getElement I can't reach the message. But with the Regex mentioned below i can.)

I am trying to get a value using this expression:

\s*<td width="61%"class="valorCampoSinTamFijoPeque">(.|\n)*?<\/td>

How i use this expression:

var pulledmessage = /\s*<td width="61%"class="valorCampoSinTamFijoPeque">(.|\n)*?<\/td>/.exec(htmlDoc);

The above expression gives me NULL when i console.log() it. My guess is that the htmlDoc format that i supply the regex is not working. I just have no clue how to make it so the value does get pulled.

What i use to parse HTML:

var html1 = httpGet(messages);

parser = new DOMParser();

htmlDoc = parser.parseFromString(html1,"text/html");

The result I want to get:

<td width="61%"class="valorCampoSinTamFijoPeque"><b>D.</b> De: 
Information, Information. 
Information, Information
Para: Information
CC: Information
Alot of text here ............
</td>

I edited the above value to remove personal information.

html1 contains a full HTML page with the information required.

enter image description here

2 Answers 2

1

New attempt. Seeing how the td you need is commented out, remove all HTML comment delimiters from the loaded HTML file before parsing the document. This will result in the td being rendered in the document and you can use innerHTML to get the message content.

const 
  documentString = `
  <!doctype html>
    <html>
    <body>
      <div class="valorCampoSinTamFijoPeque">1</div>
      <div class="valorCampoSinTamFijoPeque">2</div>
      <div class="valorCampoSinTamFijoPeque">3</div>
      <div class="valorCampoSinTamFijoPeque">4</div>
      <div class="valorCampoSinTamFijoPeque">5</div>
      <div class="valorCampoSinTamFijoPeque">6</div>
      <!--<div class="valorCampoSinTamFijoPeque"><b>D.</b> De: Information, Information. Information, Information Para: Information CC: Information Alot of text here ............</div>-->
      <div class="valorCampoSinTamFijoPeque">8</div>
      </body>
    </html>`,
  outputElement = document.getElementById('output');

  debugger;
const
  // Remove all comment delimiters from the input string.
  cleanupDocString = documentString.replace(/(?:<!--|-->)/gm, '');
// Create a parser and construct a document based on the string. It should 
// output 8 divs.
parser = new DOMParser();
htmlDoc = parser.parseFromString(cleanupDocString,"text/html");

const
  // Get the 7th div with the class name from the parsed document.
  element = htmlDoc.getElementsByClassName('valorCampoSinTamFijoPeque')[6];

// Log the element found in the parsed document.
console.log(element);
// Log the content from the element.
console.log(element.innerHTML);
<div id="output"></div>

Sign up to request clarification or add additional context in comments.

11 Comments

I believe i implemented it correctly, im getting data from element now but its not the text. this is my code now var html1 = httpGet(messages); const // Remove all comment delimiters from the input string. cleanupDocString = html1.replace(/(?:<!--|-->)/gm, ''); parser = new DOMParser(); htmlDoc = parser.parseFromString(html1,"text/html");
const // Get the 7th div with the class name from the parsed document. element = htmlDoc.getElementsByClassName('valorCampoSinTamFijoPeque')[7]; // Log the element found in the parsed document. console.log(element); // Log the content from the element. console.log(element.innerHTML);
Say this is the 7th element: <td width="61%"class="valorCampoSinTamFijoPeque"><b>D.</b> De: Information, Information. Information, Information Para: Information CC: Information Alot of text here ............ </td>. What do you need from this element?
<b>D.</b> De: Information, Information. Information, Information Para: Information CC: Information Alot of text here ............
You need the 7th element but you're getting the element at index 7 instead of 6. That strikes me as odd. When you say you're getting data from the element but not the text, what do you mean exactly. I now know what you're expecting (the value of element.innerHTML) but I don't know what the current output is.
|
0

There is no need for a regex, native JS has your back!

const 
  documentString = '<!doctype html><html><body><div class="valorCampoSinTamFijoPeque">1</div><div class="valorCampoSinTamFijoPeque">2</div><div class="valorCampoSinTamFijoPeque">3</div><div class="valorCampoSinTamFijoPeque">4</div><div class="valorCampoSinTamFijoPeque">5</div><div class="valorCampoSinTamFijoPeque">6</div><div class="valorCampoSinTamFijoPeque">7<!--<b>D.</b> De: Information, Information. Information, Information Para: Information CC: Information Alot of text here ............--></div><div class="valorCampoSinTamFijoPeque">8</div></body></html>',
  outputElement = document.getElementById('output');
  

function getCommentText(element) {
  for (var index=0; index<element.childNodes.length;index++){
    const
      node = element.childNodes[index];
    if (node.nodeType === Node.COMMENT_NODE) {
      return node.data;
    }
  }
}

// Create a parser and construct a document based on the string. It should 
// output 8 divs.
parser = new DOMParser();
htmlDoc = parser.parseFromString(documentString,"text/html");

const
  // Get the 7th div with the class name from the parsed document.
  element = htmlDoc.getElementsByClassName('valorCampoSinTamFijoPeque')[6];

// Replace the HTML of the element with the content of the comment.
element.innerHTML = getCommentText(element);

// The the inner HTML of the parsed document's body and place it inside the output  
// element in the page that is visible in the user agent. The 7th div should not 
// contain a number but the text that was originally in the comment.
outputElement.innerHTML = htmlDoc.body.innerHTML;
<div id="output"></div>

17 Comments

When i implement this i get this error: Cannot read property 'childnodes' of undifined. I changed const element = document.getElementById('element'); to const element = document.getElementsByClassName("valorCampoSinTamFijoPeque")[7]; the 7 is there because this is the sevened div that uses that class (i know its stupid to use the same class for multiple DIVS but thats how this website was already designed.
When it is the 7th div, shouldn't you use getElementsByClassName("valorCampoSinTamFijoPeque")[6]? I am assuming it is zero based like all other methods.
Yes it starts at 0 but i there are 8 in total, this one is the 7
In that case I would have to see your code and HTML in order to say why it doesn't work. It has to do with retrieving the element from the DOM.
its hard for me to explain my situation since i didnt fully develop this plugin. Im not fully sure how to implement the code, maybe im doing something wrong.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.