Separate strings from other iterables in python 3

Question

I'm trying to determine whether a function argument is a string, or some other iterable. Specifically, this is used in building URL parameters, in an attempt to emulate PHP's &param[]=val syntax for arrays - so duck typing doesn't really help here, I can iterate through a string and produce things like &param[]=v&param[]=a&param[]=l, but this is clearly not what we want. If the parameter value is a string (or a bytes? I still don't know what the point of a bytes actually is), it should produce &param=val, but if the parameter value is (for example) a list, each element should receive its own &param[]=val. I've seen a lot of explanations about how to do this in 2.* involving isinstance(foo, basestring), but basestring doesn't exist in 3.*, and I've also read that isinstance(foo, str) will miss more complex strings (I think unicode?). So, what is the best way to do this without causing some types to be lost to unnecessary errors?

Danica · Accepted Answer · 2012-11-04 07:57:57Z

3

You've been seeing things that somewhat conflict based on Python 2 vs 3. In Python 3, isinstance(foo, str) is almost certainly what you want. bytes is for raw binary data, which you probably can't include in an argument string like that.

The python 2 str type stored raw binary data, usually a string in some specific encoding like utf8 or latin-1 or something; the unicode type stored a more "abstract" representation of the characters that could then be encoded into whatever specific encoding. basestring was a common ancestor for both of them so you could easily say "any kind of string".

In python 3, str is the more "abstract" type, and bytes is for raw binary data (like a string in a specific encoding, or whatever raw binary data you want to handle). You shouldn't use bytes for anything that would otherwise be a string, so there's not a real reason to check if it's either str or bytes. If you absolutely need to, though, you can do something like isinstance(foo, (str, bytes)).

edited Nov 4, 2012 at 7:57

answered Nov 4, 2012 at 3:49

Danica

29k6 gold badges94 silver badges128 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

user395760 Over a year ago

I'd argue the other way around. To pass data around on the web, you need to encode it (create bytes) at some point. So it can make perfect sense to construct an URL from bytes pieces.

Danica Over a year ago

Yeah, that's true - but query strings are supposed to be ascii-only and %-encoded, right? So most APIs you'd probably want to pass either a URL-encoded string (presumably a str) or a general string that will later be URL-encoded (also a str).

user395760 Over a year ago

The thing is, if it's already urlencoded (and thus plain ASCII) it might as well be bytes. In fact, that may be the saner choice, because it would cause errors if someone latter tried to add another string without urlencoding it first.

Danica Over a year ago

In python 3 I'd consider it very strange to have a URL-encoded ASCII string being passed around in your library in bytes.

Collectives™ on Stack Overflow

Separate strings from other iterables in python 3

1 Answer 1

4 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

4 Comments

Your Answer

Sign up or log in

Post as a guest

Related