Slicing URL with Python

Question

I am working with a huge list of URL's. Just a quick question I have trying to slice a part of the URL out, see below:

http://www.domainname.com/page?CONTENT_ITEM_ID=1234&param2&param3

How could I slice out:

http://www.domainname.com/page?CONTENT_ITEM_ID=1234

Sometimes there is more than two parameters after the CONTENT_ITEM_ID and the ID is different each time, I am thinking it can be done by finding the first & and then slicing off the chars before that &, not quite sure how to do this tho.

Cheers

guettli · Accepted Answer · 2013-11-19 14:53:17Z

14

Use the urlparse module. Check this function:

import urlparse

def process_url(url, keep_params=('CONTENT_ITEM_ID=',)):
    parsed= urlparse.urlsplit(url)
    filtered_query= '&'.join(
        qry_item
        for qry_item in parsed.query.split('&')
        if qry_item.startswith(keep_params))
    return urlparse.urlunsplit(parsed[:3] + (filtered_query,) + parsed[4:])

In your example:

>>> process_url(a)
'http://www.domainname.com/page?CONTENT_ITEM_ID=1234'

This function has the added bonus that it's easier to use if you decide that you also want some more query parameters, or if the order of the parameters is not fixed, as in:

>>> url='http://www.domainname.com/page?other_value=xx&param3&CONTENT_ITEM_ID=1234&param1'
>>> process_url(url, ('CONTENT_ITEM_ID', 'other_value'))
'http://www.domainname.com/page?other_value=xx&CONTENT_ITEM_ID=1234'

edited Nov 19, 2013 at 14:53

guettli

27.6k109 gold badges423 silver badges779 bronze badges

answered Nov 3, 2008 at 16:25

tzot

96.6k30 gold badges151 silver badges210 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Rafał Dowgird · Accepted Answer · 2008-11-03 14:34:34Z

4

The quick and dirty solution is this:

>>> "http://something.com/page?CONTENT_ITEM_ID=1234&param3".split("&")[0]
'http://something.com/page?CONTENT_ITEM_ID=1234'

answered Nov 3, 2008 at 14:34

Rafał Dowgird

45.4k11 gold badges80 silver badges95 bronze badges

Comments

Kena · Accepted Answer · 2008-11-03 14:36:06Z

3

Another option would be to use the split function, with & as a parameter. That way, you'd extract both the base url and both parameters.

   url.split("&")

returns a list with

  ['http://www.domainname.com/page?CONTENT_ITEM_ID=1234', 'param2', 'param3']

answered Nov 3, 2008 at 14:36

Kena

6,9215 gold badges38 silver badges46 bronze badges

Comments

RailsSon · Accepted Answer · 2008-11-03 14:33:32Z

1

I figured it out below is what I needed to do:

url = "http://www.domainname.com/page?CONTENT_ITEM_ID=1234&param2&param3"
url = url[: url.find("&")]
print url
'http://www.domainname.com/page?CONTENT_ITEM_ID=1234'

answered Nov 3, 2008 at 14:33

RailsSon

20.7k31 gold badges86 silver badges106 bronze badges

7 Comments

Rafał Dowgird Over a year ago

Careful with this - if there are no parameters (no "&"), it will just drop the last character from the url.

S.Lott Over a year ago

See stackoverflow.com/questions/229352/python-find-question for a better solution.

RailsSon Over a year ago

Ah I see how that could be a problem and thanks for the warning. The list I am using always has a parameter after it but I will keep that in mind for the future. :)

Bite code Over a year ago

Be careful with url parsing, this most of the time not as easy as it seems. You'd better use the urlparse module, even if it looks like it's easy.

S.Lott Over a year ago

@Eef: Always means "mostly". Never means "Rarely". As soon as you say "Always", you know it will break because 2 of 14,000 violate your "always" rule.

|

Bite code · Accepted Answer · 2008-11-03 15:52:06Z

1

Parsin URL is never as simple I it seems to be, that's why there are the urlparse and urllib modules.

E.G :

import urllib
url ="http://www.domainname.com/page?CONTENT_ITEM_ID=1234&param2&param3"
query = urllib.splitquery(url)
result = "?".join((query[0], query[1].split("&")[0]))
print result
'http://www.domainname.com/page?CONTENT_ITEM_ID=1234'

This is still not 100 % reliable, but much more than splitting it yourself because there are a lot of valid url format that you and me don't know and discover one day in error logs.

answered Nov 3, 2008 at 15:52

Bite code

600k118 gold badges310 silver badges335 bronze badges

Comments

Corey Goldberg · Accepted Answer · 2008-11-03 14:34:17Z

0

import re
url = 'http://www.domainname.com/page?CONTENT_ITEM_ID=1234&param2&param3'
m = re.search('(.*?)&', url)
print m.group(1)

answered Nov 3, 2008 at 14:34

Corey Goldberg

61.4k30 gold badges135 silver badges147 bronze badges

Comments

Community · Accepted Answer · 2017-05-23 12:31:45Z

0

Look at the urllib2 file name question for some discussion of this topic.

Also see the "Python Find Question" question.

edited May 23, 2017 at 12:31

CommunityBot

11 silver badge

answered Nov 3, 2008 at 14:41

S.Lott

393k83 gold badges520 silver badges791 bronze badges

Comments

Jeremy Cantrell · Accepted Answer · 2008-11-03 15:31:04Z

0

This method isn't dependent on the position of the parameter within the url string. This could be refined, I'm sure, but it gets the point across.

url = 'http://www.domainname.com/page?CONTENT_ITEM_ID=1234&param2&param3'
parts = url.split('?')
id = dict(i.split('=') for i in parts[1].split('&'))['CONTENT_ITEM_ID']
new_url = parts[0] + '?CONTENT_ITEM_ID=' + id

answered Nov 3, 2008 at 15:31

Jeremy Cantrell

27.6k13 gold badges60 silver badges79 bronze badges

Comments

Alien Life Form · Accepted Answer · 2010-02-24 14:43:26Z

0

An ancient question, but still, I'd like to remark that query string paramenters can also be separated by ';' not only '&'.

answered Feb 24, 2010 at 14:43

Alien Life Form

1,9542 gold badges19 silver badges27 bronze badges

Comments

neutrinus · Accepted Answer · 2012-07-20 09:39:32Z

0

beside urlparse there is also furl, which has IMHO better API.

answered Jul 20, 2012 at 9:39

neutrinus

2,0192 gold badges17 silver badges21 bronze badges

Collectives™ on Stack Overflow

Slicing URL with Python

10 Answers 10

Comments

Comments

Comments

7 Comments

Comments

Comments

Comments

Comments

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

10 Answers 10

Comments

Comments

Comments

7 Comments

Comments

Comments

Comments

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related