4

Just writing a little function here and need some optimisation help!

All requests redirect to the index page,

I have this function that parses a url into an array.

The type of url is depicted as:

http://localhost/{user}/{page}/?sub_page={sub_page}&action={action}

So an example would be:

http://localhost/admin/stock/?sub_page=products&action=add

When requesting the uri the domain is excluded so my function accepts strings like so:

/admin/stock/?sub_page=products&action=add

My function is as follows and WARNING it's very procedural.

for those of you that cannot be bothered to read and understand it, ive added an explaination at the bottom ;)

function uri_to_array($uri){
    // uri will be in format: /{user}/{page}/?sub_page={subpage}&action={action} ... && plus additional parameters

    // define array that will be returned
    $return_uri_array = array();

    // separate path from querystring;
    $array_tmp_uri = explode("?", $uri);

    // if explode returns the same as input $string, no delimeter was found
    if ($uri == $array_tmp_uri[0]){ 

        // no question mark found.
        // format either '/{user}/{page}/' or '/{user}/'
        $uri = trim($array_tmp_uri[0], "/");

        // remove excess baggage
        unset ($array_tmp_uri);

        // format either '{user}/{page}' or '{user}'
        $array_uri = explode("/", $uri);

        // if explode returns the same as input $string, no delimiter was found
        if ($uri == $array_uri[0]){
            // no {page} defined, just user.
            $return_uri_array["user"] = $array_uri[0];
        }
        else{
            // {user} and {page} defined.
            $return_uri_array["user"] = $array_uri[0];
            $return_uri_array["page"] = $array_uri[1];            
        }
    }
    else{

        // query string is defined
        // format either '/{user}/{page}/' or '/{user}/'
        $uri = trim($array_tmp_uri[0], "/");
        $parameters = trim($array_tmp_uri[1]);

        // PARSE PATH
        // remove excess baggage
        unset ($array_tmp_uri);

        // format either '{user}/{page}' or '{user}'
        $array_uri = explode("/", $uri);

        // if explode returns the same as input $string, no delimiter was found
        if ($uri == $array_uri[0]){
            // no {page} defined, just user.
            $return_uri_array["user"] = $array_uri[0];
        }
        else{
            // {user} and {page} defined.
            $return_uri_array["user"] = $array_uri[0];
            $return_uri_array["page"] = $array_uri[1];            
        }

        // parse parameter string
        $parameter_array = array();
        parse_str($parameters, $parameter_array);

        // copy parameter array into return array
        foreach ($parameter_array as $key => $value){
            $return_uri_array[$key] = $value;
        }
    }
    return $return_uri_array;
}

basically there is one main if statement, one path is if no querystring is defined (no '?') and the other path is if the '?' does exist.

I'm just looking to make this function better.

Would it be worth making it a class?

Essentially i need a function that takes /{user}/{page}/?sub_page={sub_page}&action={action} as an argument and returns

array(
    "user" => {user},
    "page" => {page},
    "sub_page" => {sub_page},
    "action" => {action}
)

Cheers, Alex

4
  • take a look at parse_url() to simplify some tasks. Commented Jan 24, 2012 at 12:16
  • This looks overly complicated. Wouldn't a simple preg_match() for your desired pattern serve the same purpose? Commented Jan 24, 2012 at 12:17
  • well it would, however the preg_match doesnt return associative arrays ?? :S Commented Jan 24, 2012 at 12:20
  • You can have associative arrays in your result, if you use named subpatterns Commented Jan 24, 2012 at 12:48

3 Answers 3

2

If you want to

  • Do it properly
  • Use a regular expression
  • Use the same method to parse all URL:s (parse_url() does not support relative paths, called only_path below)

This might suite your taste:

$url = 'http://localhost/admin/stock/?sub_page=products&action=add';
preg_match ("!^((?P<scheme>[a-zA-Z][a-zA-Z\d+-.]*):)?(((//(((?P<credentials>([a-zA-Z\d\-._~\!$&'()*+,;=%]*)(:([a-zA-Z\d\-._~\!$&'()*+,;=:%]*))?)@)?(?P<host>([\w\d-.%]+)|(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})|(\[([a-fA-F\d.:]+)\]))?(:(?P<port>\d*))?))(?<path>(/[a-zA-Z\d\-._~\!$&'()*+,;=:@%]*)*))|(?P<only_path>(/(([a-zA-Z\d\-._~\!$&'()*+,;=:@%]+(/[a-zA-Z\d\-._~\!$&'()*+,;=:@%]*)*))?)|([a-zA-Z\d\-._~\!$&'()*+,;=:@%]+(/[a-zA-Z\d\-._~\!$&'()*+,;=:@%]*)*)))?(?P<query>\?([a-zA-Z\d\-._~\!$&'()*+,;=:@%/?]*))?(?P<fragment>#([a-zA-Z\d\-._~\!$&'()*+,;=:@%/?]*))?$!u", $url, $matches);
$parts = array_intersect_key ($matches, array ('scheme' => '', 'credentials' => '', 'host' => '', 'port' => '', 'path' => '', 'query' => '', 'fragment' => '', 'only_path' => '', ));
var_dump ($parts);

It should cover just about all possible well-formed URL:s

If host is empty, only_path should hold the path, that is protocol-less and host-less URL.

UPDATE:

Maybe I should read the question a bit better. This will parse the URL into components that you can use to more easily get the parts you're really interested in. Run something like:

// split the URL
preg_match ('!^((?P<scheme>[a-zA-Z][a-zA-Z\d+-.]*):)?(((//(((?P<credentials>([a-zA-Z\d\-._~\!$&'()*+,;=%]*)(:([a-zA-Z\d\-._~\!$&'()*+,;=:%]*))?)@)?(?P<host>([\w\d-.%]+)|(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})|(\[([a-fA-F\d.:]+)\]))?(:(?P<port>\d*))?))(?<path>(/[a-zA-Z\d\-._~\!$&'()*+,;=:@%]*)*))|(?P<only_path>(/(([a-zA-Z\d\-._~\!$&'()*+,;=:@%]+(/[a-zA-Z\d\-._~\!$&'()*+,;=:@%]*)*))?)|([a-zA-Z\d\-._~\!$&'()*+,;=:@%]+(/[a-zA-Z\d\-._~\!$&'()*+,;=:@%]*)*)))?(\?(?P<query>([a-zA-Z\d\-._~\!$&'()*+,;=:@%/?]*)))?(#(?P<fragment>([a-zA-Z\d\-._~\!$&'()*+,;=:@%/?]*)))?$!u', $url, $matches);
$parts = array_intersect_key ($matches, array ('scheme' => '', 'credentials' => '', 'host' => '', 'port' => '', 'path' => '', 'query' => '', 'fragment' => '', 'only_path' => '', ));

// extract the user and page
preg_match ('!/*(?P<user>.*)/(?P<page>.*)/!u', $parts['path'], $matches);
$user_and_page = array_intersect_key ($matches, array ('user' => '', 'page' => '', ));

// the query string stuff
$query = array ();
parse_str ($parts['query'], $query);

References:

Just to clarify, here are the relevant documents used to formulate the regular expression:

  1. RFC3986 scheme/protocol
  2. RFC3986 user and password
  3. RFC1035 hostname
    • Or RFC3986 IPv4
    • Or RFC2732 IPv6
  4. RFC3986 query
  5. RFC3986 fragment
Sign up to request clarification or add additional context in comments.

Comments

2

This mabye?

function uri_to_array($uri){
  $result = array();

  parse_str(substr($uri, strpos($uri, '?') + 1), $result);
  list($result['user'], $result['page']) = explode('/', trim($uri, '/'));

  return $result;
}

print_r(
  uri_to_array('/admin/stock/?sub_page=products&action=add')
);

/*
Array
(
    [sub_page] => products
    [action] => add
    [page] => stock
    [user] => admin
)
*/

demo: http://codepad.org/nBCj38zT

Comments

2

Some suggestions to make this function better.

First, use parse_url instead of explode to separate the hostname, path and query string.

Second, put to code for parsing the path before you decide if you have query string, since you parse the path either way.

Third, instead of the foreach loop to copy the parameters, use array_merge like this:

// put $return_uri_array last so $parameter_array can't override values
$return_uri_array = array_merge($parameter_array, $return_uri_array); 

If this should be a class or not depends on your programming style. As a general rule I'd always use classes because it's easier to mock them in unit tests.

The most compact way would be a regular expression like this (not fully tested, just to show the principle)

if(preg_match('!http://localhost/(?P<user>\w+)(?:/(?P<page>\w+))/(?:\?sub_page=(?P<sub_page>\w+)&action=(?P<action>\w+))!', $uri, $matches)) {
  return $matches;
}

The resulting array will also have the numeric indexes of the matches, but you can just ignore them or filter your wanted keys with array_intersect_keys. The \w+ pattern matches all "word" characters, you may replace it with character classes like [-a-zA-Z0-9_] or something similar.

3 Comments

Thankyou, ill get on it now and post my results
The PCRE thing is too easy to get wrong and is presented as an alternative to parse_url() - parse_url() should be the starting point.
Well, he asked how to make his function "better". If "better" means less code, then the PCRE solution is best. If "better" means less code plus clear understanding what the function does, then a regular expression is not the way to go, I agree with that ;-)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.