2

I have an input-field into which an user can write some links, after the submit I want to check this input for the correct structure.

The allowed structure:

Google: http://google.com
YouTube: http://youtube.com
Stackoverflow: http://stackoverflow.com/

My Regex doesn't work as I imagined it.

(.*)\:(\s?)(.*)\n

The Regex shall be used in a preg_match-function.


Edit (moved from a comment):

My Code:

$input = 'Google: http://google.com
YouTube: http://youtube.com
wrong
Stackoverflow: http://stackoverflow.com/';
if (preg_match_all('/(.*?)\:\s?(.*?)$/m', $input))
{
    echo 'ok';
}
else
{
    echo 'no';
}

I get 'ok'. But because of the 'wrong' which is not the right pattern I expect a 'no'.

8
  • So no www and always .com in the end? Commented Dec 18, 2015 at 22:02
  • No, it should be variable. Commented Dec 18, 2015 at 22:04
  • Only thing I see off is that you are making \n required. Really you should do $ with the m modifier. And you want to make your first (.*) non greedy or it will match up to the : in the url. Commented Dec 18, 2015 at 22:04
  • Oh, and use preg_match_all instead of preg_match or else you will match the first one and nothing else. regex101.com/r/oQ1dL8/2 Commented Dec 18, 2015 at 22:06
  • Precise URL matching is complex: stackoverflow.com/questions/161738/… Commented Dec 18, 2015 at 22:07

3 Answers 3

2

There are a few things to correct:

  • The asterisk operator is greedy. In your case you want it to be lazy, so add a question mark after it in both instances;
  • You probably are not interested in retaining the separating space in the middle, so don't put brackets around it;
  • if you want all lines to be processed, you need to use preg_match_all instead of preg_match;
  • unless you are certain that your last line ends with a new line, you need to test for the end of the string with the dollar sign;
  • as that last test will need brackets, use ?: to make it non-capturing as you are not interested in retaining that new line character;
  • some systems have \r before every \n, so you should add that, otherwise it gets into one of your capture groups. Alternatively, use the m modifier in combination with $ (end-of-line) and forget about newlines;
  • as the colon also appears in a URL, you should at least test for that one, otherwise the absence of the first one (after the site name) will make the "http" become part of the site name.

This leads to the following:

$input =
"Google: http://google.com
YouTube: http://youtube.com
Stackoverflow: https://stackoverflow.com/";

$result = preg_match("/(.*?)\:\s?(\w?)\:(.*?)$/m", $input, $matches);
echo $result ? "matched!"
print_r ($matches);

Outputs:

Array
(
    [0] => Array
        (
            [0] => Google: http://google.com
            [1] => YouTube: http://youtube.com
            [2] => Stackoverflow: https://stackoverflow.com/
        )

    [1] => Array
        (
            [0] => Google
            [1] => YouTube
            [2] => Stackoverflow
        )

    [2] => Array
        (
            [0] => http://google.com
            [1] => http://youtube.com
            [2] => https://stackoverflow.com/
        )
)

The first element has the complete matches (the lines), the second element the matches of the first capturing group, and the last element the contents of the second capturing group.

Note that the above does not validate URLs. That is a subject on its own. Have look at this

EDIT

If you are interested in deciding on whether the whole input is correctly formatted or not, then you can use the above expression, but then with preg_replace. You replace all the good lines by blanks, trim the end-result from newlines, and test whether anything is left over:

$result =  trim(preg_replace("/(.*?)\:\s?(\w*?):(.*?)$/m", "", $input));
if ($result == "") {
    echo "It matches the pattern";
} else {
    echo "It does not match the pattern. Offending lines:
         " . $result;
}

The above would allow empty lines to occur in your input.

Sign up to request clarification or add additional context in comments.

2 Comments

I think you understood me wrong. I want only to check if the structure has been complied. In this comment I said more to my problem.
Your 'Edit' solved my Problem. Sorry for all obscurities. Thanks!
0

Your question is somewhat vague. To match a url, you could simply do sth. like:

^[^:]+:\s*https?:\/\/[^\s]+$
# match everything except a colon, then followed by a colon
# followed by whitespaces or not
# match http/https, a colon, two forward slashes literally
# afterwards, match everything except a whitespace one or unlimited times
# anchor it to start(^) and end($) (as wanted in the comment)

See a working demo here.

2 Comments

I don't want to get the URL or something else of the string. I want to check if the structure has been complied.
@Xübecks: You need to assure anchor points then, see my updated answer.
0

You can achieve that by finding a line that does not meet your requirement.

Use '~(.*?):\s?(.*)$~m' with a !preg_match. See this demo printing "no":

$input = 'Google: http://google.com
YouTube: http://youtube.com
wrong
Stackoverflow: http://stackoverflow.com/';
if (!preg_match('~(.*?):\s?(.*)$~m', $input)) {
    echo 'ok';
}
else {
    echo 'no';
}

Note that you do not need to escape : symbol. Also, I suggest switching to greedy dot matching at the end, since this will force the engine to grab all the line till the end at once, and then checking the end of line there, so the regex will be more efficient. You could also try replacing the first .*? with [^:]* for efficiency sake.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.