1

I am trying to get data from a post request using the following line :

$data = file_get_contents('php://input');

The json string might be like: {"test" : "test one \xe0 "}

The problem is when I try to do a json_decode($data), I get null. By var_dump()ing $data, I see some characters like \xe0 \xe7a.

The data sent is in utf-8. I use utf8_decode($data) as well, but with no luck. Could someone explain what I am missing or how to solve this issue?

I need to convert the invalid json from:

$data = '{"test" : "test one \xe0 "}';

to:

$data = '{"test" : "test one à "}';
15
  • This is invalid code. Commented Dec 19, 2023 at 19:01
  • 1
    Anyway, the JSON is invalid: 3v4l.org/MGJvI Commented Dec 19, 2023 at 19:52
  • 1
    According to this, \x is not allowed in JSON. Commented Dec 19, 2023 at 19:59
  • 1
    If the JSON error is fairly consistent when json decode fails, fallback to string manipulation to replace /xe0 for example and then attempt to decode again. But doesn't make sense to send JSON if you are going to send invalid JSON. Commented Dec 19, 2023 at 20:18
  • 1
    You might be able to use a third party library to parse the incoming JSON differently than json_decode Commented Dec 19, 2023 at 20:23

2 Answers 2

2

A way to fix your JSON is to replace the invalid \xNN sequences with valid \u00NN sequences:

$data = '{"test" : "test one \xe0 "}';
$val = json_decode(str_replace('\x', '\u00', $data));
echo $val->test;

Output:

test one à 
Sign up to request clarification or add additional context in comments.

1 Comment

This does work, but be careful, it won't work when there's a two or three byte hexadecimal code in the original JSON representing a multi-byte character.
1

Mutating a json string with string functions will always be something to be done with apprehension because it is generally easy for a false positive replacement to damage the payload. That said, here is a script to attempt to correct your invalid json string.

Code: (Demo)

$json = '{"test" : "test one \xe0, \x270B"}';
    
$json = preg_replace_callback(
           '/\\\\x([[:xdigit:]]+)/',
           fn($m) => sprintf('\u%04s', $m[1]),
           $json
     );
     
echo "\n" . var_export(json_validate($json), true);
echo "\n$json\n";
var_export(json_decode($json));

Output:

true
{"test" : "test one \u00e0, \u270B"}
(object) array(
   'test' => 'test one à, ✋',
)

If this has known flaws, please leave a comment below and I'll endeavor to overcome the issue when I have time.

A related answer of mine: Replace all hex sequences with ascii characters

2 Comments

This works fine as long as the escaped entity was meant to be the Unicode code-point. If it's meant to be some random undisclosed impossible-to-autodetect single-byte encoding, you'll need to add some mb_convert_encoding() to the mix and do a lot of guessing. Of course, the main point is clear: why use a standard in the first place if you aren't going to comply with it?
Is there a reason that this answer received a no-comment dv and the other answer didn't? Is this just a user popularity contest?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.