0

So I am extracting some data from a some websites and would like to remove some unnecessarily text from it.

So I did some parsers that can control the parsed content before presenting it to the users.

Here is my test code that I did.

// tried using this but it strill did not work 
function escapeRegex(string) {
return string.replace(/[-\/\\^$*+?.()|[\]{}]/g, '\\$&');
}

var div = document.getElementById("content");
var txArray = ["If you find any errors ( broken links, non-standard content, etc.. ), Please let us know < report chapter > so we can fix it as soon as possible.", "KobatoChanDaiSuki"]
txArray.forEach(x => {
  var reg = new RegExp(escapeRegex(x), "gi");
  div.innerHTML = div.innerHTML.replace(reg, "");
});
<div id="content">
Hyung : Big/older brother. Kind of an equivalent to the japanese “onii-san” but only used between male (male to male).
Translator :Pumba
TL Check : KobatoChanDaiSuki
 If you find any errors ( broken links, non-standard content, etc.. ), Please let us know < report chapter > so we can fix it as soon as possible.
</div>

Se above it is not removing all the contents, Why is that ?

maybe I need to break the long string and then try to clean it, I really do not know? What do you think?

1 Answer 1

1

The problem is that (, ), and . have special meanings in JavaScript regular expressions. An additional problem is that < and > are written as &lt; and &gt; respectively in innerHTML. innerText avoids this problem. (I figured this out by adding console.log(div.innerHTML) to look at the contents; see the snippet below.)

Try this:

var txArray = ["If you find any errors \\( broken links, non-standard content, etc\\.\\. \\), Please let us know < report chapter > so we can fix it as soon as possible\\.", "KobatoChanDaiSuki"]
txArray.forEach(x => {
  var reg = new RegExp(x, "gi");
  div.innerText = div.innerText.replace(reg, "");
});

Or you can write code to escape your regular expressions, as in the following:

var reg = new RegExp(x.replace(/[-\/\\^$*+?.()|[\]{}]/g, '\\$&'), "gi");

var div = document.getElementById("content");
var txArray = ["If you find any errors \\( broken links, non-standard content, etc\\.\\. \\), Please let us know < report chapter > so we can fix it as soon as possible\\.", "KobatoChanDaiSuki"];

txArray.forEach(x => {
  var reg = new RegExp(x, "gi");
  console.log(div.innerHTML);
  div.innerText = div.innerText.replace(reg, "");
});
<div id="content">
Hyung : Big/older brother. Kind of an equivalent to the japanese “onii-san” but only used between male (male to male).
Translator :Pumba
TL Check : KobatoChanDaiSuki
 If you find any errors ( broken links, non-standard content, etc.. ), Please let us know < report chapter > so we can fix it as soon as possible.
</div>

Sign up to request clarification or add additional context in comments.

5 Comments

No working, I tried what you wrote and its not working. Please try to create a simple runnable code here on stackoverflow.
See above I updated the code and did as you said and still not working
Ah, < and > are causing trouble too. I've updated my answer.
I do not want to write \` manually, could you fix the escapeRegex` function I wrote instead so i could replace those unwanted char in txArray
That code should work if you switch from innerHTML to innerText.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.