2

I take HTML in as a string and then I parse it to change all href links to something else. This works however, when the HTML page has some JS script tags i.e. <script> it gets removed! For example this line:

<script type="text/javascript" src="/js/jquery.js"></script>

Gets Changed to:

[removed][removed] 

However, I would like to keep everything in. This is my function:

function parse_html_code($code, $code_id){

libxml_use_internal_errors(true);

$xml = new DOMDocument();

$xml->loadHTML($code); 

foreach($xml->getElementsByTagName('a') as $link) {

  $link->setAttribute('href', CLK_BASE."clk.php?i=$code_id&j=" . $link->getAttribute('href'));

}

return $xml->saveHTML();

}

I appreciate any help on this.

2
  • 1
    DOM will not remove any tags on it's own or insert [removed] markers anywhere. Please provide a reproducable example that illustrates the problem. Commented Mar 20, 2011 at 13:45
  • X-Ref: PHP Headless Browser? Commented Jun 25, 2013 at 15:39

1 Answer 1

3

CodeIgniter's bogus anti-XSS ‘feature’ is mauling your script's input before DOMDocument gets a look at it. Script tags and various other strings will be removed, replaced with “[removed]” other otherwise messed-about with for no good reason. See the system/libraries/Security.php module for the full embarrassing details.

To turn off this misguided feature, set $config['global_xss_filtering']= FALSE. You'll have to make sure your script is actually handling string escaping properly, of course (eg always HTML-escaping user input when including in a page). But then you have to do that anyway; anti-XSS doesn't fix your text processing problems, it just obscures them.

$link->setAttribute('href', CLK_BASE."clk.php?i=$code_id&j=" . $link->getAttribute('href'));

You'll need to urlencode that getAttribute('href') (and potentially $code_id if it's not just numeric or something).

Sign up to request clarification or add additional context in comments.

1 Comment

Thank you for the explanation and tip about the URL encode! I feel like I have to defend CI since I am using it, but I am not going to as I don't much about it...yet, but I am sure it has its pros.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.