5

I've stubmled across this behavior on PHP 5.6 (also identical in PHP 5.4 up to 7.0).

$note = new SimpleXMLElement('<Note></Note>');
$note->addChild("string0", 'just a string');
$note->addChild("string1", "abc\n\n\n");
$note->addChild("string2", "\tdef");
$note->addChild("string3", "\n\n\n");
$note->addChild("string4", "\t\n");

$json = json_encode($note, JSON_PRETTY_PRINT);

print($json);

Outputs:

{
    "string0": "just a string",
    "string1": "abc\n\n\n",
    "string2": "\tdef",
    "string3": {
        "0": "\n\n\n"
    },
    "string4": {
        "0": "\t\n"
    }
}

There must be a reason behind this behavior, I would like to understand. And also, if you know of a way to force it to behave the same way for strings of texts and whitespace I would appreciate you sharing your ideas!

Edit. Here's a snippet you can run: http://sandbox.onlinephpfunctions.com/code/d797623553c11b7a7648340880a92e98b19d1925

9
  • I can't reproduce this running php 5.5.9. for me, string3 and string4 are just blank whitespace. however, curiously enough, the whitespace characters are being taken literal the same as your example for string1 and string2. Commented Jul 20, 2016 at 17:05
  • Added the snipped in my question. Commented Jul 20, 2016 at 17:07
  • @JeffPuckettII you are right on 5.5, but most of 5.6 versions are producing the above result. And all versions of PHP 7 I could test. Commented Jul 20, 2016 at 17:08
  • I see, your question does say "also identical in PHP 5.4 up to 7.0" Commented Jul 20, 2016 at 17:12
  • 1
    What is the expected output? I recon you're wondering where does "0" node come from. Commented Jul 21, 2016 at 10:55

1 Answer 1

1

This comes from RFC 4627 (emphasis mine)

All Unicode characters may be placed within the quotation marks except for the characters that must be escaped: quotation mark, reverse solidus, and the control characters (U+0000 through U+001F).

Newline(\n) is U+000A in UTF-8 so PHP dutifully converts it back to its respective JS equivalent

PHP uses this RFC for json_encode

PHP implements a superset of JSON as specified in the original » RFC 4627 - it will also encode and decode scalar types and NULL.

As I noted in the comments, all versions of PHP, going back to 5.2, do it this way(Demo)

Sign up to request clarification or add additional context in comments.

6 Comments

strange that this causes decoding problems for others when unescaped as \n instead of \\n
I might not understand it, but how does this character encoding ends up in {"0": "\n\n\n"} form instead of "string" ?
@Vallieres I think that's due to the SimpleXML conversion. If you put it into an array like I did it doesn't do that. I shoved the SXML back in for kicks and reran it and got all sorts of wackiness 3v4l.org/kKfrL
Yes, exactly my problem. :( You're explanation is interesting but I'm wondering about the reasoning between SimpleXML -> JSON and possibly a way to fix this (without preg_replace'ng, of course).
The root of that problem is probably due to the fact that SXML is an object that contains other objects (and sometimes arrays). So if you have something come along and try to iterate it you get some really weird results. So I'm not surprised it grabbed an internal array and added it in.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.