Modify URL components in Python 2

Question

Is there a cleaner way to modify some parts of a URL in Python 2?

For example

http://foo/bar -> http://foo/yah

At present, I'm doing this:

import urlparse

url = 'http://foo/bar'

# Modify path component of URL from 'bar' to 'yah'
# Use nasty convert-to-list hack due to urlparse.ParseResult being immutable
parts = list(urlparse.urlparse(url))
parts[2] = 'yah'

url = urlparse.urlunparse(parts)

Is there a cleaner solution?

What exactly do you mean by 'clean'?

Tim
– Tim

2014-06-13 08:35:40 +00:00
Commented Jun 13, 2014 at 8:35 — Tim
– Tim, Commented Jun 13, 2014 at 8:35

Community · Accepted Answer · 2017-05-23 10:29:59Z

25

Unfortunately, the documentation is out of date; the results produced by urlparse.urlparse() (and urlparse.urlsplit()) use a collections.namedtuple()-produced class as a base.

Don't turn this namedtuple into a list, but make use of the utility method provided for just this task:

parts = urlparse.urlparse(url)
parts = parts._replace(path='yah')

url = parts.geturl()

The namedtuple._replace() method lets you create a new copy with specific elements replaced. The ParseResult.geturl() method then re-joins the parts into a url for you.

Demo:

>>> import urlparse
>>> url = 'http://foo/bar'
>>> parts = urlparse.urlparse(url)
>>> parts = parts._replace(path='yah')
>>> parts.geturl()
'http://foo/yah'

mgilson filed a bug report (with patch) to address the documentation issue.

edited May 23, 2017 at 10:29

CommunityBot

11 silver badge

answered Jun 13, 2014 at 8:35

Martijn Pieters

1.1m326 gold badges4.2k silver badges3.4k bronze badges

Sign up to request clarification or add additional context in comments.

10 Comments

mgilson Over a year ago

I was going to point this out. The utility methods are provided to urlparse.ParseResult by the subclass returned by namedtuple. I think that this should be pointed out in the 2.7 docs, because without knowing that, you have no way of knowing that _replace actually is part of the public API for this class...

mgilson Over a year ago

Even more interesting is the mention of BaseResult in the docs which doesn't appear in the source at all ... (sorry about the digression ... It's late ... +1 anyway)

Martijn Pieters Over a year ago

@mgilson: heh, indeed, that must be a leftover from before namedtuple was used.

Gareth Stockwell Over a year ago

Thanks - that's a nicer solution. Although, as pointed out in the other comments, there seems to be no way to know about it, based on the docs alone.

Martijn Pieters Over a year ago

@GarethStockwell: yeah, looks like a doc bug; none filed yet, I'll do that later.

|

naren · Accepted Answer · 2017-08-24 07:33:34Z

-1

I guess the proper way to do it is this way.

As using _replace private methods or variables is not suggested.

from urlparse import urlparse, urlunparse

res = urlparse('http://www.goog.com:80/this/is/path/;param=paramval?q=val&foo=bar#hash')
l_res = list(res)
# this willhave ['http', 'www.goog.com:80', '/this/is/path/', 'param=paramval', 'q=val&foo=bar', 'hash']
l_res[2] = '/new/path'
urlunparse(l_res)
# outputs 'http://www.goog.com:80/new/path;param=paramval?q=val&foo=bar#hash'

answered Aug 24, 2017 at 7:33

naren

15.3k5 gold badges42 silver badges47 bronze badges

1 Comment

vidstige Over a year ago

it's part of the public interface, it's just prefixed with underscrore to not clash with actual members.

Collectives™ on Stack Overflow

Modify URL components in Python 2

2 Answers 2

10 Comments

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

10 Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related