PHP/Regex: Check the format of an input

Question

I have an input-field into which an user can write some links, after the submit I want to check this input for the correct structure.

The allowed structure:

Google: http://google.com
YouTube: http://youtube.com
Stackoverflow: http://stackoverflow.com/

My Regex doesn't work as I imagined it.

(.*)\:(\s?)(.*)\n

The Regex shall be used in a preg_match-function.

Edit (moved from a comment):

My Code:

$input = 'Google: http://google.com
YouTube: http://youtube.com
wrong
Stackoverflow: http://stackoverflow.com/';
if (preg_match_all('/(.*?)\:\s?(.*?)$/m', $input))
{
    echo 'ok';
}
else
{
    echo 'no';
}

I get 'ok'. But because of the 'wrong' which is not the right pattern I expect a 'no'.

Only thing I see off is that you are making \n required. Really you should do $ with the m modifier. And you want to make your first (.*) non greedy or it will match up to the : in the url. — Jonathan Kuhn
– Jonathan Kuhn, Commented Dec 18, 2015 at 22:04
Oh, and use preg_match_all instead of preg_match or else you will match the first one and nothing else. regex101.com/r/oQ1dL8/2 — Jonathan Kuhn
– Jonathan Kuhn, Commented Dec 18, 2015 at 22:06
Precise URL matching is complex: stackoverflow.com/questions/161738/… — trincot
– trincot, Commented Dec 18, 2015 at 22:07

Community · Accepted Answer · 2017-05-23 11:52:18Z

There are a few things to correct:

The asterisk operator is greedy. In your case you want it to be lazy, so add a question mark after it in both instances;
You probably are not interested in retaining the separating space in the middle, so don't put brackets around it;
if you want all lines to be processed, you need to use preg_match_all instead of preg_match;
unless you are certain that your last line ends with a new line, you need to test for the end of the string with the dollar sign;
as that last test will need brackets, use ?: to make it non-capturing as you are not interested in retaining that new line character;
some systems have \r before every \n, so you should add that, otherwise it gets into one of your capture groups. Alternatively, use the m modifier in combination with $ (end-of-line) and forget about newlines;
as the colon also appears in a URL, you should at least test for that one, otherwise the absence of the first one (after the site name) will make the "http" become part of the site name.

This leads to the following:

$input =
"Google: http://google.com
YouTube: http://youtube.com
Stackoverflow: https://stackoverflow.com/";

$result = preg_match("/(.*?)\:\s?(\w?)\:(.*?)$/m", $input, $matches);
echo $result ? "matched!"
print_r ($matches);

Outputs:

Array
(
    [0] => Array
        (
            [0] => Google: http://google.com
            [1] => YouTube: http://youtube.com
            [2] => Stackoverflow: https://stackoverflow.com/
        )

    [1] => Array
        (
            [0] => Google
            [1] => YouTube
            [2] => Stackoverflow
        )

    [2] => Array
        (
            [0] => http://google.com
            [1] => http://youtube.com
            [2] => https://stackoverflow.com/
        )
)

The first element has the complete matches (the lines), the second element the matches of the first capturing group, and the last element the contents of the second capturing group.

Note that the above does not validate URLs. That is a subject on its own. Have look at this

EDIT

If you are interested in deciding on whether the whole input is correctly formatted or not, then you can use the above expression, but then with preg_replace. You replace all the good lines by blanks, trim the end-result from newlines, and test whether anything is left over:

$result =  trim(preg_replace("/(.*?)\:\s?(\w*?):(.*?)$/m", "", $input));
if ($result == "") {
    echo "It matches the pattern";
} else {
    echo "It does not match the pattern. Offending lines:
         " . $result;
}

The above would allow empty lines to occur in your input.

I think you understood me wrong. I want only to check if the structure has been complied. In this comment I said more to my problem.
Your 'Edit' solved my Problem. Sorry for all obscurities. Thanks!

Jan · Accepted Answer · 2015-12-18 22:27:35Z

0

Your question is somewhat vague. To match a url, you could simply do sth. like:

^[^:]+:\s*https?:\/\/[^\s]+$
# match everything except a colon, then followed by a colon
# followed by whitespaces or not
# match http/https, a colon, two forward slashes literally
# afterwards, match everything except a whitespace one or unlimited times
# anchor it to start(^) and end($) (as wanted in the comment)

See a working demo here.

edited Dec 18, 2015 at 22:27

answered Dec 18, 2015 at 22:06

Jan

43.3k11 gold badges57 silver badges87 bronze badges

2 Comments

Sven Eberth Over a year ago

I don't want to get the URL or something else of the string. I want to check if the structure has been complied.

Jan Over a year ago

@Xübecks: You need to assure anchor points then, see my updated answer.

Wiktor Stribiżew · Accepted Answer · 2015-12-18 23:06:24Z

0

You can achieve that by finding a line that does not meet your requirement.

Use '~(.*?):\s?(.*)$~m' with a !preg_match. See this demo printing "no":

$input = 'Google: http://google.com
YouTube: http://youtube.com
wrong
Stackoverflow: http://stackoverflow.com/';
if (!preg_match('~(.*?):\s?(.*)$~m', $input)) {
    echo 'ok';
}
else {
    echo 'no';
}

Note that you do not need to escape : symbol. Also, I suggest switching to greedy dot matching at the end, since this will force the engine to grab all the line till the end at once, and then checking the end of line there, so the regex will be more efficient. You could also try replacing the first .*? with [^:]* for efficiency sake.

answered Dec 18, 2015 at 23:06

Wiktor Stribiżew

631k41 gold badges502 silver badges632 bronze badges

Collectives™ on Stack Overflow

PHP/Regex: Check the format of an input

Edit (moved from a comment):

3 Answers 3

EDIT

2 Comments

2 Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

Edit (moved from a comment):

3 Answers 3

EDIT

2 Comments

2 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related