12

Is there a cleaner way to modify some parts of a URL in Python 2?

For example

http://foo/bar -> http://foo/yah

At present, I'm doing this:

import urlparse

url = 'http://foo/bar'

# Modify path component of URL from 'bar' to 'yah'
# Use nasty convert-to-list hack due to urlparse.ParseResult being immutable
parts = list(urlparse.urlparse(url))
parts[2] = 'yah'

url = urlparse.urlunparse(parts)

Is there a cleaner solution?

1
  • What exactly do you mean by 'clean'? Commented Jun 13, 2014 at 8:35

2 Answers 2

25

Unfortunately, the documentation is out of date; the results produced by urlparse.urlparse() (and urlparse.urlsplit()) use a collections.namedtuple()-produced class as a base.

Don't turn this namedtuple into a list, but make use of the utility method provided for just this task:

parts = urlparse.urlparse(url)
parts = parts._replace(path='yah')

url = parts.geturl()

The namedtuple._replace() method lets you create a new copy with specific elements replaced. The ParseResult.geturl() method then re-joins the parts into a url for you.

Demo:

>>> import urlparse
>>> url = 'http://foo/bar'
>>> parts = urlparse.urlparse(url)
>>> parts = parts._replace(path='yah')
>>> parts.geturl()
'http://foo/yah'

mgilson filed a bug report (with patch) to address the documentation issue.

Sign up to request clarification or add additional context in comments.

10 Comments

I was going to point this out. The utility methods are provided to urlparse.ParseResult by the subclass returned by namedtuple. I think that this should be pointed out in the 2.7 docs, because without knowing that, you have no way of knowing that _replace actually is part of the public API for this class...
Even more interesting is the mention of BaseResult in the docs which doesn't appear in the source at all ... (sorry about the digression ... It's late ... +1 anyway)
@mgilson: heh, indeed, that must be a leftover from before namedtuple was used.
Thanks - that's a nicer solution. Although, as pointed out in the other comments, there seems to be no way to know about it, based on the docs alone.
@GarethStockwell: yeah, looks like a doc bug; none filed yet, I'll do that later.
|
-1

I guess the proper way to do it is this way.

As using _replace private methods or variables is not suggested.

from urlparse import urlparse, urlunparse

res = urlparse('http://www.goog.com:80/this/is/path/;param=paramval?q=val&foo=bar#hash')
l_res = list(res)
# this willhave ['http', 'www.goog.com:80', '/this/is/path/', 'param=paramval', 'q=val&foo=bar', 'hash']
l_res[2] = '/new/path'
urlunparse(l_res)
# outputs 'http://www.goog.com:80/new/path;param=paramval?q=val&foo=bar#hash'

1 Comment

it's part of the public interface, it's just prefixed with underscrore to not clash with actual members.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.