parse url string (path and parameters) into array

Question

Just writing a little function here and need some optimisation help!

All requests redirect to the index page,

I have this function that parses a url into an array.

The type of url is depicted as:

http://localhost/{user}/{page}/?sub_page={sub_page}&action={action}

So an example would be:

http://localhost/admin/stock/?sub_page=products&action=add

When requesting the uri the domain is excluded so my function accepts strings like so:

/admin/stock/?sub_page=products&action=add

My function is as follows and WARNING it's very procedural.

for those of you that cannot be bothered to read and understand it, ive added an explaination at the bottom ;)

function uri_to_array($uri){
    // uri will be in format: /{user}/{page}/?sub_page={subpage}&action={action} ... && plus additional parameters

    // define array that will be returned
    $return_uri_array = array();

    // separate path from querystring;
    $array_tmp_uri = explode("?", $uri);

    // if explode returns the same as input $string, no delimeter was found
    if ($uri == $array_tmp_uri[0]){ 

        // no question mark found.
        // format either '/{user}/{page}/' or '/{user}/'
        $uri = trim($array_tmp_uri[0], "/");

        // remove excess baggage
        unset ($array_tmp_uri);

        // format either '{user}/{page}' or '{user}'
        $array_uri = explode("/", $uri);

        // if explode returns the same as input $string, no delimiter was found
        if ($uri == $array_uri[0]){
            // no {page} defined, just user.
            $return_uri_array["user"] = $array_uri[0];
        }
        else{
            // {user} and {page} defined.
            $return_uri_array["user"] = $array_uri[0];
            $return_uri_array["page"] = $array_uri[1];            
        }
    }
    else{

        // query string is defined
        // format either '/{user}/{page}/' or '/{user}/'
        $uri = trim($array_tmp_uri[0], "/");
        $parameters = trim($array_tmp_uri[1]);

        // PARSE PATH
        // remove excess baggage
        unset ($array_tmp_uri);

        // format either '{user}/{page}' or '{user}'
        $array_uri = explode("/", $uri);

        // if explode returns the same as input $string, no delimiter was found
        if ($uri == $array_uri[0]){
            // no {page} defined, just user.
            $return_uri_array["user"] = $array_uri[0];
        }
        else{
            // {user} and {page} defined.
            $return_uri_array["user"] = $array_uri[0];
            $return_uri_array["page"] = $array_uri[1];            
        }

        // parse parameter string
        $parameter_array = array();
        parse_str($parameters, $parameter_array);

        // copy parameter array into return array
        foreach ($parameter_array as $key => $value){
            $return_uri_array[$key] = $value;
        }
    }
    return $return_uri_array;
}

basically there is one main if statement, one path is if no querystring is defined (no '?') and the other path is if the '?' does exist.

I'm just looking to make this function better.

Would it be worth making it a class?

Essentially i need a function that takes /{user}/{page}/?sub_page={sub_page}&action={action} as an argument and returns

array(
    "user" => {user},
    "page" => {page},
    "sub_page" => {sub_page},
    "action" => {action}
)

Cheers, Alex

This looks overly complicated. Wouldn't a simple preg_match() for your desired pattern serve the same purpose? — bkzland
– bkzland, Commented Jan 24, 2012 at 12:17
well it would, however the preg_match doesnt return associative arrays ?? :S — AlexMorley-Finch
– AlexMorley-Finch, Commented Jan 24, 2012 at 12:20
You can have associative arrays in your result, if you use named subpatterns — chiborg
– chiborg, Commented Jan 24, 2012 at 12:48

zrvan · Accepted Answer · 2012-01-24 15:08:37Z

If you want to

Do it properly
Use a regular expression
Use the same method to parse all URL:s (parse_url() does not support relative paths, called only_path below)

This might suite your taste:

$url = 'http://localhost/admin/stock/?sub_page=products&action=add';
preg_match ("!^((?P<scheme>[a-zA-Z][a-zA-Z\d+-.]*):)?(((//(((?P<credentials>([a-zA-Z\d\-._~\!$&'()*+,;=%]*)(:([a-zA-Z\d\-._~\!$&'()*+,;=:%]*))?)@)?(?P<host>([\w\d-.%]+)|(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})|(\[([a-fA-F\d.:]+)\]))?(:(?P<port>\d*))?))(?<path>(/[a-zA-Z\d\-._~\!$&'()*+,;=:@%]*)*))|(?P<only_path>(/(([a-zA-Z\d\-._~\!$&'()*+,;=:@%]+(/[a-zA-Z\d\-._~\!$&'()*+,;=:@%]*)*))?)|([a-zA-Z\d\-._~\!$&'()*+,;=:@%]+(/[a-zA-Z\d\-._~\!$&'()*+,;=:@%]*)*)))?(?P<query>\?([a-zA-Z\d\-._~\!$&'()*+,;=:@%/?]*))?(?P<fragment>#([a-zA-Z\d\-._~\!$&'()*+,;=:@%/?]*))?$!u", $url, $matches);
$parts = array_intersect_key ($matches, array ('scheme' => '', 'credentials' => '', 'host' => '', 'port' => '', 'path' => '', 'query' => '', 'fragment' => '', 'only_path' => '', ));
var_dump ($parts);

It should cover just about all possible well-formed URL:s

If host is empty, only_path should hold the path, that is protocol-less and host-less URL.

UPDATE:

Maybe I should read the question a bit better. This will parse the URL into components that you can use to more easily get the parts you're really interested in. Run something like:

// split the URL
preg_match ('!^((?P<scheme>[a-zA-Z][a-zA-Z\d+-.]*):)?(((//(((?P<credentials>([a-zA-Z\d\-._~\!$&'()*+,;=%]*)(:([a-zA-Z\d\-._~\!$&'()*+,;=:%]*))?)@)?(?P<host>([\w\d-.%]+)|(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})|(\[([a-fA-F\d.:]+)\]))?(:(?P<port>\d*))?))(?<path>(/[a-zA-Z\d\-._~\!$&'()*+,;=:@%]*)*))|(?P<only_path>(/(([a-zA-Z\d\-._~\!$&'()*+,;=:@%]+(/[a-zA-Z\d\-._~\!$&'()*+,;=:@%]*)*))?)|([a-zA-Z\d\-._~\!$&'()*+,;=:@%]+(/[a-zA-Z\d\-._~\!$&'()*+,;=:@%]*)*)))?(\?(?P<query>([a-zA-Z\d\-._~\!$&'()*+,;=:@%/?]*)))?(#(?P<fragment>([a-zA-Z\d\-._~\!$&'()*+,;=:@%/?]*)))?$!u', $url, $matches);
$parts = array_intersect_key ($matches, array ('scheme' => '', 'credentials' => '', 'host' => '', 'port' => '', 'path' => '', 'query' => '', 'fragment' => '', 'only_path' => '', ));

// extract the user and page
preg_match ('!/*(?P<user>.*)/(?P<page>.*)/!u', $parts['path'], $matches);
$user_and_page = array_intersect_key ($matches, array ('user' => '', 'page' => '', ));

// the query string stuff
$query = array ();
parse_str ($parts['query'], $query);

References:

Just to clarify, here are the relevant documents used to formulate the regular expression:

RFC3986 scheme/protocol
RFC3986 user and password
RFC1035 hostname
- Or RFC3986 IPv4
- Or RFC2732 IPv6
RFC3986 query
RFC3986 fragment

Yoshi · Accepted Answer · 2012-01-24 12:38:57Z

2

This mabye?

function uri_to_array($uri){
  $result = array();

  parse_str(substr($uri, strpos($uri, '?') + 1), $result);
  list($result['user'], $result['page']) = explode('/', trim($uri, '/'));

  return $result;
}

print_r(
  uri_to_array('/admin/stock/?sub_page=products&action=add')
);

/*
Array
(
    [sub_page] => products
    [action] => add
    [page] => stock
    [user] => admin
)
*/

demo: http://codepad.org/nBCj38zT

edited Jan 24, 2012 at 12:38

answered Jan 24, 2012 at 12:30

Yoshi

54.8k14 gold badges93 silver badges108 bronze badges

Comments

chiborg · Accepted Answer · 2012-01-24 12:46:44Z

2

Some suggestions to make this function better.

First, use parse_url instead of explode to separate the hostname, path and query string.

Second, put to code for parsing the path before you decide if you have query string, since you parse the path either way.

Third, instead of the foreach loop to copy the parameters, use array_merge like this:

// put $return_uri_array last so $parameter_array can't override values
$return_uri_array = array_merge($parameter_array, $return_uri_array);

If this should be a class or not depends on your programming style. As a general rule I'd always use classes because it's easier to mock them in unit tests.

The most compact way would be a regular expression like this (not fully tested, just to show the principle)

if(preg_match('!http://localhost/(?P<user>\w+)(?:/(?P<page>\w+))/(?:\?sub_page=(?P<sub_page>\w+)&action=(?P<action>\w+))!', $uri, $matches)) {
  return $matches;
}

The resulting array will also have the numeric indexes of the matches, but you can just ignore them or filter your wanted keys with array_intersect_keys. The \w+ pattern matches all "word" characters, you may replace it with character classes like [-a-zA-Z0-9_] or something similar.

edited Jan 24, 2012 at 12:46

answered Jan 24, 2012 at 12:22

chiborg

28.4k15 gold badges102 silver badges120 bronze badges

3 Comments

AlexMorley-Finch Over a year ago

Thankyou, ill get on it now and post my results

symcbean Over a year ago

The PCRE thing is too easy to get wrong and is presented as an alternative to parse_url() - parse_url() should be the starting point.

chiborg Over a year ago

Well, he asked how to make his function "better". If "better" means less code, then the PCRE solution is best. If "better" means less code plus clear understanding what the function does, then a regular expression is not the way to go, I agree with that ;-)

Collectives™ on Stack Overflow

parse url string (path and parameters) into array

3 Answers 3

Comments

Comments

3 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

Comments

3 Comments

Your Answer

Sign up or log in

Post as a guest

Related