45

Is it actually safe/valid to use multidimensional array synthax in the URL query string?

http://example.com?abc[]=123&abc[]=456

It seems to work in every browser and I always thought it was OK to use, but accodring to a comment in this article it is not: http://www.456bereastreet.com/archive/201008/what_characters_are_allowed_unencoded_in_query_strings/#comment4

I would like to hear a second opinion.

3
  • What is "multidimensional" in this? Or are you referring to the get vars being represented as an array in a server side scripting language ? Commented Jul 15, 2012 at 8:16
  • @arkascha yep, I mean a query string like this ?a[b][c][d][e]=f, server side script then treats it like a multidimensional array Commented Jul 15, 2012 at 8:26
  • When used judiciously by a URL dereferencing algorithm, which is the intent here, that is the intended purpose of reserving the square brackets. You should most definitely not use them in a resource name for that very reason -- as referencing algorithms need to use it. Commented Oct 2, 2022 at 14:52

7 Answers 7

40

The answer is not simple.

The following is extracted from section 3.2.2 of RFC 3986 :

A host identified by an Internet Protocol literal address, version 6
[RFC3513] or later, is distinguished by enclosing the IP literal
within square brackets ("[" and "]"). This is the only place where
square bracket characters are allowed in the URI syntax.

This seems to answer the question by flatly stating that square brackets are not allowed anywhere else in the URI. But there is a difference between a square bracket character and a percent encoded square bracket character.

The following is extracted from the beginning of section 3 of RFC 3986 :

  1. Syntax Components

The generic URI syntax consists of a hierarchical sequence of
components referred to as the scheme, authority, path, query, and
fragment.

  URI         = scheme ":" hier-part [ "?" query ] [ "#" fragment ]

So the "query" is a component of the "URI".

The following is extracted from section 2.2 of RFC 3986 :

2.2. Reserved Characters

URIs include components and subcomponents that are delimited by
characters in the "reserved" set. These characters are called
"reserved" because they may (or may not) be defined as delimiters by
the generic syntax, by each scheme-specific syntax, or by the
implementation-specific syntax of a URI's dereferencing algorithm.
If data for a URI component would conflict with a reserved
character's purpose as a delimiter, then the conflicting data must
be percent-encoded before the URI is formed.

  reserved    = gen-delims / sub-delims

  gen-delims  = ":" / "/" / "?" / "#" / "[" / "]" / "@"

  sub-delims  = "!" / "$" / "&" / "'" / "(" / ")"
              / "*" / "+" / "," / ";" / "="

So square brackets may appear in a query string, but only if they are percent encoded. Unless they aren't, to be explained further down in section 2.2 :

URI producing applications should percent-encode data octets that
correspond to characters in the reserved set unless these characters
are specifically allowed by the URI scheme to represent data in that
component. If a reserved character is found in a URI component and
no delimiting role is known for that character, then it must be
interpreted as representing the data octet corresponding to that
character's encoding in US-ASCII.

So because square brackets are only allowed in the "host" subcomponent, they "should" be percent encoded in other components and subcomponents, and in this case in the "query" component, unless RFC 3986 explicitly allows unencoded square brackets to represent data in the query component, which it does not.

However, if a "URI producing application" fails to do what it "should" do, by leaving square brackets unencoded in the query, then readers of the URI are not to reject the URI outright. Instead, the square brackets are to be considered as belonging to the data of the query component, since they are not used as delimiters in that component.

This is why, for example, it is not a violation of RFC 3986 when PHP accepts both unencoded and percent encoded square brackets as valid characters in a query string, and even assigns to them a special purpose. However, it would appear that authors who try to take advantage of this loophole by not percent encoding square brackets are in violation of RFC 3986.

Sign up to request clarification or add additional context in comments.

4 Comments

"Square brackets can appear in the query string if they are percent encoded, unless they aren't" xD. very nice answer.
This is a fantastic answer, but it doesn't account for the WHATWG Url spec. See my answer below.
This is very well reasoned, but I have to interject. The delimiter character classes do not reserve characters only, and strictly for the protocol. These characters are called "reserved" because they may (or may not) be defined as delimiters by the generic syntax, by each scheme-specific syntax, or by the implementation-specific syntax of a URI's dereferencing algorithm - Framework, platform and filesystem developers are free to use these with due research. ...
... Userland developers are free to incorporate such delimiters into a URL only when passing to such a system with those known properties, as several web frameworks do. You should not use delimiters in URL alias names or for any information. They may contain data but not be data, as they are delimiters. Some parts of the URL (namely everything before the /path) further restrict delimiter usage. Also notable is the sub-component delimiter ; is special during the path component for splitting the resource from other metadata example.com/my; non-routing data/url -> example.com/my/url
11

According to RFC 3986, the Query component of an URL has the following grammar:

*( pchar / "/" / "?" )

From appendix A of the same RFC:

pchar         = unreserved / pct-encoded / sub-delims / ":" / "@"
[...]
pct-encoded   = "%" HEXDIG HEXDIG

unreserved    = ALPHA / DIGIT / "-" / "." / "_" / "~"
[...]    
sub-delims    = "!" / "$" / "&" / "'" / "(" / ")"
             / "*" / "+" / "," / ";" / "="

My interpretation of this is that anything that isn't:

 ALPHA / DIGIT / "-" / "." / "_" / "~" / 
     "!" / "$" / "&" / "'" / "(" / ")" / 
     "*" / "+" / "," / ";" / "=" / ":" / "@"

...should be pct-encoded, i.e percent-encoded. Thus [ and ] should be percent-encoded to follow RFC 3986.

4 Comments

You are certainly right, but help me following with that interpretation. The extract you gave is incomplete, 'reserved' is never re-mentioned here. So the definition makes no sense like this. As I read it the square brackets are defined as reserved characters with a special meaning (not sure which), therefore they should not be escaped if you want to express that meaning. If you escape em you simply transfer a string containing square brackets as value of the parameter. So I ask myself: well, what is actually the meaning of square brackets being reserved chars in urls?
I left the definition of reserved and gen-delims in the quote to make it easier see how [] are classified in the grammar - notice that only a subset of reserved is a pchar.
Square brackets are reserved for IP v6 address literals. tools.ietf.org/html/rfc3986#appendix-D.1, tools.ietf.org/html/rfc2732#section-2
Because an IPv6 literal applies only to the host part http://[1080:0:0:0:8:800:200C:417A]/index.html, hence I guess that they don't need to be escaped when used inside a query string
11

David N. Jafferian's answer is fantastic. I just want to add a couple updates and practical notes:

  1. For many years, every browser has left square brackets in query strings unencoded when submitting the request to the server. (Source: https://bugzilla.mozilla.org/show_bug.cgi?id=1152455#c6). As such, I imagine a huge portion of the web has come to rely on this behavior, which makes it extremely unlikely to change.

  2. My reading of the WHATWG URL standard which, at least for web purposes, can be seen as superseding RFC 3986, is that it codifies this behavior of not encoding [ and ] in query strings.

Edit: Based on the comments and other answers, a more correct reading of the WHATWG URL standard is that unencoded [/] are invalid, but also should be tolerated when received/parsed and, once parsed that way, should even be re-serialized without encoding.

7 Comments

Maybe also relevant is WHATWG URL § Percent-encoded bytes. It says square brackets need only be encoded as part of the userinfo percent-encode set. That means only in the leading authority component, not in the path, query and fragment components.
You are right that query-state is the relevant portion, however it is the parsing algorithm, so, naturally, it does not make any reference about encoding anything at all. The general step of this state is item 3 in the list, which reads “If c is not a URL code point and not U+0025 (%), validation error”. Since [ and ] are not URL code points, this standard very explicitly prohibits square brackets in URLs.
@kirelagin That's a very good point that I'm quoting from the parsing section. I mentioned encoding in my answer, though, because the version of the spec at the time actually did specify doing some percent-encoding at parse time. The serialization section of the spec just directly appends the parsed query state as-is, so if an unencoded [ or ] can get into the parsed state, then it definitely is legal to serialize unencoded.
@kirelagin Your point about validation errors though is a good one too, and kinda complicates things. The spec says that validation errors should not terminate the parser. So, running serialize(parse(urlWithUnencodedQueryBrackets)) would output back the brackets unencoded, I think, but there'd also be a validation error. Make of that what you will, i guess.
Edit: Well, "then it definitely is legal to serialize unencoded." might be a bit of a stretch. The serialization steps do say to leave the parsed query as-is, but the resulting serialized value would violate the spec's definition of a query string, per the answer below: stackoverflow.com/a/54460012/1261879
|
4

I'd ideally like to comment on Ethan's answer really, but don't have sufficient reputation to do it.

I'm not sure that the relevant part of the WHATWG URL standard is being referenced here. I think the correct part might be in the definition of a valid URL-query string, which it describes as being composed of URL units that themselves are formed from URL code points and percent-encoded bytes. Square brackets are listed within URL code points and thus fall into the percent-encoded bytes category.

Thus, in answer to the original question, multidimensional array syntax (i.e. using square brackets to represent array indexing) within the query part of the URL is valid, provided the square brackets are percent encoded (as %5B for [ and %5D for ]).

Comments

2

My understanding that square brackets are not first-class citizens anyway. Here is the quote: https://www.rfc-editor.org/rfc/rfc1738

Other characters are unsafe because gateways and other transport agents are known to sometimes modify such characters. These characters are "{", "}", "|", "", "^", "~", "[", "]", and "`".

Comments

1

I always had a temptation to go for that sort of query when I had to pass an array, but I steered away from it. The reason being:

  • It is not cleared defined in RFC.
  • Different languages may interpret it differently.

You have a couple of options to pass an array:

  • Encode the string representation of the array(JSON may be?)
  • Have parameters like "val1=blah&val2=blah&.." or something like that.

And if you are sure about the language you are using, you can (safely) go for the kind of query string you have (Just that you need to %-encode [] also).

2 Comments

So this will be a valid multiarray URL? ?abc%5B%5D=123&abc%5B%5D=456. Very ugly, I see why it is rarely used
That would depend on how the language treats it. Its best to stay away from it. To be a bit more precise, they are just key-value pairs. Nothing more, nothing less and there's no "array" in it.
0

In js you can use encodeURIComponent(url) to encode url befor send request.

const value = "[test]";
const encodedValue = encodeURIComponent(value);
const apiUrl = `https://example.com/api/data?query=${encodedValue}`;
fetch(apiUrl)   .then((res) => res.json())   .then((data) => 
console.log(data))   .catch((err) => console.error(err));

1 Comment

Your answer could be improved with additional supporting information. Please edit to add further details, such as citations or documentation, so that others can confirm that your answer is correct. You can find more information on how to write good answers in the help center.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.