2

This may be able to be accomplished with a regular expression but I have no idea. What I am trying to accomplish is being able to parse a string with a given delimiter but when it sees a set of brackets it parses differently. As I am a visual learning let me show you an example of what I am attempting to achieve. (PS this is getting parsed from a url)

Given the string input:

String1,String2(data1,data2,data3),String3,String4

How can I "transform" this string into this array:

{
    "String1": "String1",
    "String2": [
        "data1",
        "data2",
        "data3"
    ],
    "String3": "String3",
    "String4": "String4
}

Formatting doesn't have to be this strict as I'm just attempting to make a simple API for my project.

Obviously things like

array explode ( string $delimiter , string $string [, int $limit = PHP_INT_MAX ] )

Wouldn't work because there are commas inside the brackets as well. I've attempted manual parsing looking at each character at a time but I fear for the performance and it doesn't actually work anyway. I've pasted the gist of my attempt.

https://gist.github.com/Fudge0952/24cb4e6a4ec288a4c492

4
  • 1
    can you post your input and the desired output so we can better understand what you're trying to achieve? Commented Mar 7, 2016 at 1:13
  • I posted the desired output from the given input Commented Mar 7, 2016 at 1:25
  • Can you use pipes/different delimiter for different levels of string children? e.g. String1,String2(data1|data2|data3),String3,String4 Commented Mar 7, 2016 at 1:33
  • I can.. As it is a personal project but it would be nice to keep the delimiter the same for them all. Because then when there are cascaded object() i.e. string(object1,object2(subobject),object3) it would still work instead of needing a new delimiter for each hence limiting the amount of parsing Commented Mar 7, 2016 at 1:40

3 Answers 3

1

While you could try to split your initial string on commas and ignore anything in parentheses for the first split, this necessarily makes assumptions about what those string values can actually be (possibly requiring escaping/unescaping values depending on what those strings have to contain).

If you have control over the data format, though, it would be far better to just start with JSON. It's well-defined and well-supported.

Sign up to request clarification or add additional context in comments.

5 Comments

I did ponder the thought of JSON as the API is only outputting in JSON anyway.
But why do you want this made-up syntax instead of json?
Basically the URL was formed like this: http://example.com/api/contact?_fields=id,staff(first,last),salary and then it output a JSON response. I think I will go with the JSON response as it seems like the most well formed even though I really did want a simple syntax for URL construction
Accepted as it was the earliest answer and they're all good answers!
Thanks. I think you'll end up happier and more productive with an established format that doesn't require you to roll your own parser (with all the risk and overhead that comes with it).
1

You can either build an ad-hoc parser like (mostly untested):

<?php
$p = '!
    [^,\(\)]+  # token: String
    |,         # token: comma
    |\(        # token: open
    |\)        # token: close
!x';
$input = 'String1,String2(data1,data2,data3,data4(a,b,c)),String3,String4';

preg_match_all($p, $input, $m);
// using a norewinditerator, so we can use nested foreach-loops on the same iterator
$it = new NoRewindIterator(
    new ArrayIterator($m[0])
);

var_export( foo( $it ) );

function foo($tokens, $level=0) {
    $result = [];
    $current = null;
    foreach( $tokens as $t ) {
        switch($t) {
            case ')':
                break; // foreach loop
            case '(':
                if ( is_null($current) ) {
                    throw new Exception('moo');
                }
                $tokens->next();
                $result[$current] = foo($tokens, $level+1);
                $current = null;
                break;
            case ',':
                if ( !is_null($current) ) {
                    $result[] = $current;
                    $current = null;
                }
                break;
            default:
                $current = $t;
                break;
        }   
    }
    if ( !is_null($current) ) {
        $result[] = $current;
    }
    return $result;
}

prints

array (
  0 => 'String1',
  'String2' => 
  array (
    0 => 'data1',
    1 => 'data2',
    2 => 'data3',
    'data4' => 
    array (
      0 => 'a',
      1 => 'b',
      2 => 'c',
    ),
  ),
  1 => 'String3',
  2 => 'String4',
)

(but will most certainly fail horribly for not-well-formed strings)

or take a look at lexer/parser generator like e.g. PHP_LexerGenerator and PHP_ParserGenerator.

Comments

1

This is a solution with preg_match_all():

$string = 'String1,String2(data1,data2,data3),String3,String4,String5(data4,data5,data6)';

$pattern = '/([^,(]+)(\(([^)]+)\))?/';

preg_match_all( $pattern, $string, $matches );

$result = array();
foreach( $matches[1] as $key => $val )
{
    if( $matches[3][$key] )
    { $add = explode( ',', $matches[3][$key] ); }
    else
    { $add = $val; }
    $result[$val] = $add;
}

$json = json_encode( $result );

3v4l.org demo

Pattern explanation:

([^,(]+)        group 1: any chars except ‘,’ and ‘(’
(\(([^)]+)\))?  group 2: zero or one occurrence of brackets wrapping:
   └──┬──┘
   ┌──┴──┐
   ([^)]+)      group 3: any chars except ‘,’

3 Comments

This is exactly what I was originally attempting to achieve! One thing I noticed is that it only supports one level of brackets see here: eval.in/531501 is there any way to support an arbitrary number of levels?
mmhhh... I'm not sure at 100%, but I think it is not possible... BTW, surely I'm not the king of regex, maybe somebody else can provide better answer. However, if you can access to same data in JSON format, I strongly suggest you to use it.
Yeah I doubt I'd ever have multiple layers of brackets anyway and if it didn't make the URL look hideous I'd go for JSON any day lol.. How I wish I could accept 2 answers lol

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.