5

(Sorry for my bad English)

I have a string that I want to split into an array. The corner brackets are multiple nested arrays. Escaped characters should be preserved.

This is a sample string:

$string = '[[["Hello, \"how\" are you?","Good!",,,123]],,"ok"]'

The result structure should look like this:

array (
  0 => 
  array (
    0 => 
    array (
      0 => 'Hello, \"how\" are you?',
      1 => 'Good!',
      2 => '',
      3 => '',
      4 => '123',
    ),
  ),
  1 => '',
  2 => 'ok',
)

I have tested it with:

$pattern = '/[^"\\]*(?:\\.[^"\\]*)*/s';
$return = preg_match_all($pattern, $string, null);

But this did not work properly. I do not understand these RegEx patterns (I found this in another example on this page). I do not know whether preg_match_all is the correct command.

I hope someone can help me.

Many Thanks!!!

3 Answers 3

2

This is a tough one for a regex - but there is a hack answer to your question (apologies in advance).

The string is almost a valid array literal but for the ,,s. You can match those pairs and then convert to ,''s with

/,(?=,)/

Then you can eval that string into the output array you are looking for.

For example:

// input 
$str1 = '[[["Hello, \\"how\\" are you?","Good!",,,123]],,"ok"]';

// replace , followed by , with ,'' with a regex
$pattern = '/,(?=,)/';
$replace = ",''";
$str2 = preg_replace($pattern, $replace, $str1);

// eval updated string
$arr = eval("return $str2;");
var_dump($arr);

I get this:

array(3) {
  [0]=>
  array(1) {
    [0]=>
    array(5) {
      [0]=>
      string(21) "Hello, "how" are you?"
      [1]=>
      string(5) "Good!"
      [2]=>
      string(0) ""
      [3]=>
      string(0) ""
      [4]=>
      int(123)
    }
  }
  [1]=>
  string(0) ""
  [2]=>
  string(2) "ok"
}

Edit

Noting the inherent dangers of eval the better option is to use json_decode with the code above e.g.:

// input 
$str1 = '[[["Hello, \\"how\\" are you?","Good!",,,123]],,"ok"]';

// replace , followed by , with ,'' with a regex
$pattern = '/,(?=,)/';
$replace = ',""';
$str2 = preg_replace($pattern, $replace, $str1);

// eval updated string
$arr = json_decode($str2);
var_dump($arr);
Sign up to request clarification or add additional context in comments.

4 Comments

Using eval is a dangerous way since you don't know what contains the string. Replacing ,(?=,) with ,'' doesn't handle cases when ,, is between quotes. I'm afraid that the only way to do that is to parse the string character by character or to try to transform it to a well know format like JSON. But it isn't an easy task, not impossible however.
You are correct - I already apologised for the hack answer in the question. I believe the same challenge exists for json_encode with the ,, issue.
Sorry - json_decode - this will work if the ,, is replaced with "" e.g var_dump(json_decode('[[["Hello, \"how\" are you?","Good!","","",123]],"","ok"]'));
You can solve the ,, issue using: $string = preg_replace('~"[^"\\\\]*+(?s:\\\\.[^"\\\\]*)*+"?(*SKIP)(*F)|,\K(?=[],])|\[\K(?=,)~', '""',$string); (that handles also cases like [, or ,] and avoids quoted strings.)
1

If you can edit the code that serializes the data then it's a better idea to let the serialization be handled using json_encode & json_decode. No need to reinvent the wheel on this one.

Nice cat btw.

Comments

0

You might want to use a lexer in combination with a recursive function that actually builds the structure.

For your purpose, the following tokens have been used:

\[           # opening bracket
\]           # closing bracket
".+?(?<!\\)" # " to ", making sure it's not escaped
,(?!,)       # a comma, not followed by a comma
\d+          # at least one digit
,(?=,)       # a comma followed by a comma

The rest is programming logic, see a demo on ideone.com. Inspired by this post.


class Lexer {
    protected static $_terminals = array(
        '~^(\[)~'               => "T_OPEN",
        '~^(\])~'               => "T_CLOSE",
        '~^(".+?(?<!\\\\)")~'   => "T_ITEM",
        '~^(,)(?!,)~'           => "T_SEPARATOR",
        '~^(\d+)~'              => "T_NUMBER",
        '~^(,)(?=,)~'           => "T_EMPTY"
    );

    public static function run($line) {
        $tokens = array();
        $offset = 0;
        while($offset < strlen($line)) {
            $result = static::_match($line, $offset);
            if($result === false) {
                throw new Exception("Unable to parse line " . ($line+1) . ".");
            }
            $tokens[] = $result;
            $offset += strlen($result['match']);
        }
        return static::_generate($tokens);
    }

    protected static function _match($line, $offset) {
        $string = substr($line, $offset);

        foreach(static::$_terminals as $pattern => $name) {
            if(preg_match($pattern, $string, $matches)) {
                return array(
                    'match' => $matches[1],
                    'token' => $name
                );
            }
        }
        return false;
    }

    // a recursive function to actually build the structure
    protected static function _generate($arr=array(), $idx=0) {
        $output = array();
        $current = 0;
        for($i=$idx;$i<count($arr);$i++) {
            $type = $arr[$i]["token"];
            $element = $arr[$i]["match"];
            switch ($type) {
                case 'T_OPEN':
                    list($out, $index) = static::_generate($arr, $i+1);
                    $output[] = $out;
                    $i = $index;
                    break;
                case 'T_CLOSE':
                    return array($output, $i);
                    break;
                case 'T_ITEM':
                case 'T_NUMBER':
                    $output[] = $element;
                    break;
                case 'T_EMPTY':
                    $output[] = "";
                    break;
            }
        }
        return $output;
    }    
}

$input  = '[[["Hello, \"how\" are you?","Good!",,,123]],,"ok"]';
$items = Lexer::run($input);
print_r($items);

?>

1 Comment

@Then consider upvoting / accepting the answer (green tick on the left).

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.