3

Given a string with attribute/value pairs such as

attr1="some text" attr2 = "some other text" attr3= "some weird !@'#$\"=+ text"

the goal is to parse it and output an associative array, in this case:

array('attr1' => 'some text',
      'attr2' => 'some other text',
      'attr3' => 'some weird !@\'#$\"=+ text')

Note the inconsistent spacing around the equal signs, the escaped double quote in the input, and the escaped single quote in the output.

3
  • 2
    You're not parsing a markup language, right? Commented Oct 22, 2009 at 7:50
  • Good to ask that! No, just making up my own syntax to be easy to type on a command line. Commented Oct 22, 2009 at 7:57
  • 2
    "to be easy to type on a command line", then you might be interested in docs.php.net/getopt Commented Oct 22, 2009 at 9:31

2 Answers 2

6

Try something like this:

$text = "attr1=\"some text\" attr2 = \"some other text\" attr3= \"some weird !@'#$\\\"=+ text\"";
echo $text;
preg_match_all('/(\S+)\s*=\s*"((?:\\\\.|[^\\"])*)"/', $text, $matches, PREG_SET_ORDER);
print_r($matches);

which produces:

attr1="some text" attr2 = "some other text" attr3= "some weird !@'#$\"=+ text"

Array
(
    [0] => Array
        (
            [0] => attr1="some text"
            [1] => attr1
            [2] => some text
        )

    [1] => Array
        (
            [0] => attr2 = "some other text"
            [1] => attr2
            [2] => some other text
        )

    [2] => Array
        (
            [0] => attr3= "some weird !@'#$\"=+ text"
            [1] => attr3
            [2] => some weird !@'#$\"=+ text
        )

)

And a short explanation:

(\S+)               // match one or more characters other than white space characters
                    // > and store it in group 1
\s*=\s*             // match a '=' surrounded by zero or more white space characters 
"                   // match a double quote
(                   // open group 2
  (?:\\\\.|[^\\"])* //   match zero or more sub strings that are either a backslash
                    //   > followed by any character, or any character other than a
                    //   > backslash
)                   // close group 2
"                   // match a double quote
Sign up to request clarification or add additional context in comments.

8 Comments

What about the third example?
Yes, I forgot to double escape the backslash (and double check the output). I'm afraid I am sometimes too confident in myself. Thanks.
Is there any difference between the way php and actionscript, that is ecmascript/js btw, handles regex? Because this regex gave only the first two attrs in actionscript.
Next to no experience in ECMA-ish regex flavours, but you might want to try var regex = /(\S+)\s*=\s*"((?:\\.|[^\\"])*)"/g;, or even var regex = /(\S+)\s*=\s*"((?:\\.|[^\"])*)"/g; (not tested!).
Both works well with all three cases given by OP, but not with a trailing backslash :(
|
2

EDIT: This regex fails if the value ends in a backslash like attr4="something\\"

I don't know PHP, but since the regex would be essentially the same in any language, this is how I did it in ActionScript:

var text:String = "attr1=\"some text\" attr2 = \"some other text\" attr3= \"some weird !@'#$\\\"=+ text\"";

var regex:RegExp = /\s*(\w+)\s*=\s*(?:"(.*?)(?<!\\)")\s*/g;

var result:Object;
while(result = regex.exec(text))
    trace(result[1] + " is " + result[2]);

And I got the following out put:

attr1 is some text
attr2 is some other text
attr3 is some weird !@'#$\"=+ text

2 Comments

Just a small nitpick: if the value contains a backslash itself, like attr3 = "\\" (which will likely need escaping too), it won't work with a negative look behind. Of course, that might never happen, the OP didn't mention such corner cases.
Yeah, you are right. And that's not a nitpick - apparently this fails if the string ends with a backslash - like attr4="something\\"

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.