Decode JavaScript encoded content

Question

I'm writing a web crawler tool to collect the email addresses. After downloading the HTML content and parsing it using DomCrawler, I get this node value:

<!--
document.write("<a rel='nofollow' href='mailto:&#104;&#105;&#101;&#117;&#98;&#100;&#115;&#104;&#97;&#112;&#112;&#121;&#64;&#103;&#109;&#97;&#105;&#108;&#46;&#99;&#111;&#109;'>&#104;&#105;&#101;&#117;&#98;&#100;&#115;&#104;&#97;&#112;&#112;&#121;&#64;&#103;&#109;&#97;&#105;&#108;&#46;&#99;&#111;&#109;");
//-->This email address has been protected. You need to enable JavaScript to view the content.

How could I decode it?

Alexander Higgins · Accepted Answer · 2017-07-16 05:47:32Z

2

The value is just html encoded values of the characters from the original string so in PHP you can use html_entity_decode to get the original text.

$returnValue = html_entity_decode('mailto:&#104;&#105;&#101;&#117;&#98;&#100;&#115;&#104;&#97;&#112;&#112;&#121;&#64;&#103;&#109;&#97;&#105;&#108;&#46;&#99;&#111;&#109;'>&#104;&#105;&#101;&#117;&#98;&#100;&#115;&#104;&#97;&#112;&#112;&#121;&#64;&#103;&#109;&#97;&#105;&#108;&#46;&#99;&#111;&#109;', ENT_COMPAT);

See: https://www.functions-online.com/html_entity_decode.html

edited Jul 16, 2017 at 5:47

answered Jul 16, 2017 at 5:31

Alexander Higgins

6,9361 gold badge30 silver badges45 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Decode JavaScript encoded content

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related