2

Possible Duplicate:
PHP explode the string, but treat words in quotes as a single word.

i have a quoted string with quoted text. Can anyone give me the regex to split this up.

this has a \\\'quoted sentence\\\' inside

the quotes may also be single quotes. Im using preg_match_all.

right now this

preg_match_all('/\\\\"(?:\\\\.|[^\\\\"])*\\\\"|\S+/', $search_terms, $search_term_set);

Array
(
    [0] => Array
        (
            [0] => this
            [1] => has
            [2] => a
            [3] => \\\"quoted
            [4] => sentence\\\"
            [5] => inside
        )

)

i would like this output

Array
(
    [0] => Array
        (
            [0] => this
            [1] => has
            [2] => a
            [3] => \\\"quoted sentence\\\"
            [4] => inside
        )

)

This is NOT a duplicate of this question. PHP explode the string, but treat words in quotes as a single word

UPDATE:

Ive removed the mysql_real_escape_string. What regex do i need now Im just using magic quotes.

2
  • You should run the regex on the string before using mysql_real_escape_string. Commented Jun 10, 2011 at 20:36
  • 1
    yeah, i thought about doing that. Using it on each array value. But I guess I thought it would be better to do it just the once before the regex. I will keep that as a Plan B. Commented Jun 10, 2011 at 20:40

3 Answers 3

1

I'm thinking you might want to use strpos and substrin this case.

This is very sloppy, but hopefully you get the general idea at least.

$string = "This has a 'quoted sentence' in it";




   // get the string position of every ' " and space
    $n_string = $string;  //reset n_string
    while ($pos = strpos("'", $n_string)) {
      $single_pos_arr[] = $pos;
      $n_string = substr($n_string, $pos);
    }
    $n_string = $string;  //reset n_string
    while ($pos = strpos('"', $n_string)) {
      $double_pos_arr[] = $pos;
      $n_string = substr($n_string, $pos);
    }
    $n_string = $string;  //reset n_string
    while ($pos = strpos(" ", $n_string)) {
      $space_pos_arr[] = $pos;
      $n_string = substr($n_string, $pos);
    }

Once you have the positions, you can write a simple algorithm to finish the job.

Sign up to request clarification or add additional context in comments.

1 Comment

Very nice parser, aside from the atrocious coding standard :) - I was going to recommend writing a parser but they're a bit verbose and I remembered I had a regex that actually did this.
0

Why are there slashes in your input string?

Use stripslashes to get rid of them.

Then either write your own tokenizer or use this regex:

preg_match_all("/(\"[^\"]+\")|([^\s]+)/", $input, $matches)

6 Comments

they are the output of mysql_real_escape_string() used to prevent SQL injects.
@madphp: You should run the regex on the string before using mysql_real_escape_string.
This is true, mysql_real_escape_string should be last thing you do
just want to point out that the string has been escaped TWICE. either with mysql_real_escape_string or another function like addslashes or may be you are using magic quotes in your version of php, but in any case, this presents a bit of a -escaping redundancy-. just something to keep in mind when you are debugging
ok. thanks for pointing that out to me, how ever using magic quotes along with mysql_real_escape_string, im trying to prevent multi-byte character encoding. See shiflett.org/blog/2006/jan/… if im wrong to do this, please let me know. I assumed magic quotes is similar to add_slashes, in that it doesnt see multi-byte charcter encoding.
|
0

Too long for a comment, even though it's actually a comment.

I don't understand how it's not a duplicate, using the principle from that link and replace quotes with triple blackslashed quotes:

$text = "this has a \\\\\'quoted sentence\\\\\' inside and then \\\\\'some more\\\\\' stuff";
print $text; //check input
$pattern = "/\\\{3}'(?:[^\'])*\\\{3}'|\S+/";
preg_match_all($pattern, $text, $matches);
print_r($matches);

and you get what you need. It's pretty much 100% copy of the link you posted with the only change being exactly what the guy suggested to do if you wanted to change the delimiters.

Edit: Here's my output:

Array
(
    [0] => Array
        (
            [0] => this
            [1] => has
            [2] => a
            [3] => \\\'quoted sentence\\\'
            [4] => inside
            [5] => and
            [6] => then
            [7] => \\\'some more\\\'
            [8] => stuff
        )

)

Edit2: Are you checking for single or double quotes after 3 slashes (your input and output array doesn't match if all you're doing is matching) or are you changing single quotes after three slashes in input to triple slash double quotes in output? If all you're doing is matching just change the two single quotes in patter to escaped double quotes or wrap pattern in single quotes so you don't have to escape double quotes.

3 Comments

Im not saying it isnt a duplicate. I have magic quotes switched on, but in this script I want to double up with mysql_real_escape_string as added protection. Again, if im wrong to do this, please let me know.
You have 'NOT a duplicate' with 'NOT' in all caps :p But anyways, I don't have magic quotes on so I'm not sure what that does to your delimited single quotes. All I know is if the input (that's echo'd) looks like this has a \\\"quoted sentence\\\" inside then the pattern '/\\\{3}"(?:[^\"])*\\\{3}"|\S+/' will get you what you want. Note that pattern is different from my main post, it's for a double quote after 3 slashes.
Ive decided to stop using mysql_real_escape_string for now. What is thr regex for single backslashes?

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.