Regex - isolate sections of strings with php

Question

I have strings of data: number, space(s), then a word that can contain letters, numbers and special characters as well as spaces. I need to isolate the first number only, and then also the words only so I can re-render the data into a table.

1 foo
2   ba_r
3  foo bar
4   fo-o

EDIT: I was attempting this with "^[0-9]+[" "]" however that doesn't work.

can you show us the regex that you are using so far? StackOverflow is not a community that servers you finished code, but a community that helps you debug and improve your own.. — Zim84
– Zim84, Commented Jun 12, 2013 at 15:00

nickb · Accepted Answer · 2013-06-12 15:05:46Z

3

You can use this regex to capture each line:

/^(\d+)\s+(.*)$/m

This regex starts on each line, captures one or more digits, then matches one or more space characters, then captures anything until the end of line.

Then, with preg_match_all(), you can get the data you want:

preg_match_all( '/^(\d+)\s+(.*)$/m', $input, $matches, PREG_SET_ORDER);

Then, you can just parse out the data from the $matches array, like this:

$data = array();
foreach( $matches as $match) {
    list( , $num, $word) = $match;
    $data[] = array( $num, $word);
    // Or: $data[$num] = $word;
}

A print_r( $data); will print:

Array
(
    [0] => Array
        (
            [0] => 1
            [1] => foo
        )

    [1] => Array
        (
            [0] => 2
            [1] => ba_r
        )

    [2] => Array
        (
            [0] => 3
            [1] => foo bar
        )

    [3] => Array
        (
            [0] => 4
            [1] => fo-o
        )

)

edited Jun 12, 2013 at 15:05

answered Jun 12, 2013 at 15:00

nickb

59.7k13 gold badges115 silver badges149 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

nickb Over a year ago

@Downvoter - Any comment? I'd like to improve my answer if possible.

The Surrican Over a year ago

i did not downvote, however i may have suggestions. i do not see the point in ^, $ and the m modifier. the m modifier here is only necessary to have matches with ^ and $. however since .* does not match newlines without the s, and the pattern must therefore be matched within a single line anyway, this is not really necessary. the only thing it does is not having a mathc in lines that have non-digit caracters before the leading number. and i dont know why one would want that. a simpler solution for the same thing would be .* in the beginning. also the loop appears unnecessary.

The Surrican Over a year ago

max characters of the response went out... so in short: you have code in there that appears not necessary and unnecessarily complicated (i bet a lot of programmers do not even know by heart what m does. hwoever a simple .* in the beginning is clear)

nickb Over a year ago

@TheSurrican While you raise some interesting points, I would have to disagree. The PCRE regex modifiers are quite ubiquitous (IMO), and for your explanation, you needed to clarify that .* does not match newlines, which is something somebody can easily forget. But, anchoring the regex at the start/end of line not only distinctly and clearly defines that the match we are looking for spans one complete line, it also prevents errors where another regex could match within a line, which would be incorrect. For example: foo 1 bar 2 baz 3. Clearly this is erroneous input, and should be ignored.

The Surrican Over a year ago

i think that depends on the scenario where the regex is employed. in the context of this question i understand that the text syntax can be relied upon and the greedyness of the asterisk modifier takes care that the whole line is matched. probably, in the end, its a question of style...

The Surrican · Accepted Answer · 2013-06-12 15:01:28Z

2

$str = <<<body
1 foo
2   ba_r
3  foo bar
4   fo-o
body;

preg_match_all('/(?P<numbers>\d+) +(?P<words>.+)/', $str, $matches);
print_r(array_combine($matches['numbers'],$matches['words']));

outputs

Array
(
    [1] => foo
    [2] => ba_r
    [3] => foo bar
    [4] => fo-o
)

answered Jun 12, 2013 at 15:01

The Surrican

30k24 gold badges127 silver badges168 bronze badges

Collectives™ on Stack Overflow

Regex - isolate sections of strings with php

2 Answers 2

5 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

5 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related