5

From an external source I'm getting strings like

array(1,2,3)

but also a larger arrays like

array("a", "b", "c", array("1", "2", array("A", "B")), array("3", "4"), "d")

I need them to be an actual array in php. I know I could use eval but since it are untrusted sources I'd rather not do that. I also have no control of the external sources.

Should I use some regular expressions for this (if so, what) or is there some other way?

4
  • Your external source is giving you a string like this "array(1,2,3)" and you want to turn that text into a php array? Commented Jul 16, 2010 at 18:52
  • This is going to be though... That's not a serialization format PHP recognizes. Commented Jul 16, 2010 at 18:53
  • Can you control the external source? Is it possible to ask them to generate JSON or XML instead? Commented Jul 16, 2010 at 18:54
  • @jonathan: Yes I want that to be put in a PHP array (just like you would get with eval() but for security reasons don't want to use eval. @KennyTM: I don't have any control over the external source, so I have to work with this. Commented Jul 16, 2010 at 18:58

3 Answers 3

12

Whilst writing a parser using the Tokenizer which turned out not as easy as I expected, I came up with another idea: Why not parse the array using eval, but first validate that it contains nothing harmful?

So, what the code does: It checks the tokens of the array against some allowed tokens and chars and then executes eval. I do hope I included all possible harmless tokens, if not, simply add them. (I intentionally didn't include HEREDOC and NOWDOC, because I think they are unlikely to be used.)

function parseArray($code) {
    $allowedTokens = array(
        T_ARRAY                    => true,
        T_CONSTANT_ENCAPSED_STRING => true,
        T_LNUMBER                  => true,
        T_DNUMBER                  => true,
        T_DOUBLE_ARROW             => true,
        T_WHITESPACE               => true,
    );
    $allowedChars = array(
        '('                        => true,
        ')'                        => true,
        ','                        => true,
    );

    $tokens = token_get_all('<?php '.$code);
    array_shift($tokens); // remove opening php tag

    foreach ($tokens as $token) {
        // char token
        if (is_string($token)) {
            if (!isset($allowedChars[$token])) {
                throw new Exception('Disallowed token \''.$token.'\' encountered.');
            }
            continue;
        }

        // array token

        // true, false and null are okay, too
        if ($token[0] == T_STRING && ($token[1] == 'true' || $token[1] == 'false' || $token[1] == 'null')) {
            continue;
        }

        if (!isset($allowedTokens[$token[0]])) {
            throw new Exception('Disallowed token \''.token_name($token[0]).'\' encountered.');
        }
    }

    // fetch error messages
    ob_start();
    if (false === eval('$returnArray = '.$code.';')) {
        throw new Exception('Array couldn\'t be eval()\'d: '.ob_get_clean());
    }
    else {
        ob_end_clean();
        return $returnArray;
    }
}

var_dump(parseArray('array("a", "b", "c", array("1", "2", array("A", "B")), array("3", "4"), "d")'));

I think this is a good comprimise between security and convenience - no need to parse yourself.

For example

parseArray('exec("haha -i -thought -i -was -smart")');

would throw exception:

Disallowed token 'T_STRING' encountered.
Sign up to request clarification or add additional context in comments.

1 Comment

I was having the same thought :) I haven't given up on the idea of making it entirly with the tokeniser though, but I'll explore your script first., thanks
6

You could do:

json_decode(str_replace(array('array(', ')'), array('[', ']'), $string)));

Replace the array with square brackets. Then json_decode. If the string is just a multidimensional array with scalar values in it, then doing the str_replace will not break anything and you can json_decode it. If it contains any code, it will also replace the function brackets and then the Json won't be valid and NULL is returned.

Granted, that's a rather, umm, creative approach, but might work for you.

Edit: Also, see the comments for some further limitiations pointed out by other users.

4 Comments

I can't test this right now, but that is an elegant solution if it returns properly. +1
@KennyTM yeah that wouldnt work. I'll leave it up there nonetheless, so the OP can decide if it's of any use
And this is for arrays only. Won't work on associative arrays.
It's creative, but I like it and it might just do the trick. It will be pretty simple arrays anyway.
2

I think you should use the Tokenizer for this. Maybe I will write a script lateron, that actually does it.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.