13

I have a php application which -sometimes- fails (depends on what data I load) and gives errors like:

parser error : PCDATA invalid Char value 11
Warning: simplexml_load_file(): ath>/datadrivenbestpractices/Data-driven Best Practices in 
Warning: simplexml_load_file(): ^ in 

I am certain that there are some values which are causing the problem. I don't have control over data. I have tried solutions from: Error: "Input is not proper UTF-8, indicate encoding !" using PHP's simplexml_load_string and How to handle invalid unicode with simplexml and How to skip invalid characters in XML file using PHP but they have not helped.

The culprit strings are: 'Data Driven - Best Practices' and 'Data-driven Best Practices to Recruit and Retain Underrepresented Graduate Students May 12, 2011 - 1:30-3:00 p.m., EST' (may be dashes or return characters).

What can I do? Mine is a Windows php test environment but the live environment will be a LAMP one--can 't touch the .ini files.

Thanks.

1
  • I think you should show your XML source too. Commented Jan 22, 2013 at 17:40

2 Answers 2

18

Stripping the invalid chars before parsing would be the easiest fix:

function utf8_for_xml($string)
{
    return preg_replace ('/[^\x{0009}\x{000a}\x{000d}\x{0020}-\x{D7FF}\x{E000}-\x{FFFD}]+/u', ' ', $string);
}

From: PHP generated XML shows invalid Char value 27 message

Sign up to request clarification or add additional context in comments.

4 Comments

Not sure how this will work in my code: Here is how I load the xml: $xml_apicheck = simplexml_load_file($serveraddress.$myparam)
It should work if you do something like: simplexml_load_string(utf8_for_xml(file_get_contents($serveraddress.$myparam)));
okay, I have this: $xml_apicheck = simplexml_load_file(utf8_for_xml(file_get_contents($serveraddress.$myparam))); but am now getting error: action.php on line 100 PHP Notice: Trying to get property of non-object in
Problem may be that file_get_contents is not xml anymore?
0

Never mind, the answer in: How to skip invalid characters in XML file using PHP did work. Here is my code:

stream_filter_register('xmlutf8', 'ValidUTF8XMLFilter');

class ValidUTF8XMLFilter extends php_user_filter
{
    protected static $pattern = '/[^\x{0009}\x{000a}\x{000d}\x{0020}-\x{D7FF}\x{E000}-\x{FFFD}]+/u';

    function filter($in, $out, &$consumed, $closing)
    {
        while ($bucket = stream_bucket_make_writeable($in)) {
            $bucket->data = preg_replace(self::$pattern, '', $bucket->data);
            $consumed += $bucket->datalen;
            stream_bucket_append($out, $bucket);
        }
        return PSFS_PASS_ON;
    }
}

$doc = simplexml_load_file("php://filter/read=xmlutf8/resource=".$serveraddress.$myparam);

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.