Extract a part of URL - python

Question

I have an URL for example:

http://name.abc.wxyz:1234/Assts/asset.epx?id=F3F94D94-7232-4FA2-98EF-07sdfssfdsa3B5

From this Url I want to extract only 'asset.epx?id=F3F94D94-7232-4FA2-98EF-07sdfssfdsa3B5' how could i do that?

I am still learning regular expressions and I am not able to solve the above. Any suggestions would be appreciated.

@Haidro as per the data i have it is always the same!! but i think solved the issue..Thanks for your time! — Sangamesh Hs
– Sangamesh Hs, Commented Jul 17, 2013 at 9:13

Blender · Accepted Answer · 2019-06-14 14:00:23Z

10

In this specific example splitting the string is enough:

url.split('/')[-1]

If you have a more complex URL I would recommend the yarl library for parsing it:

>>> import yarl  # pip install yarl
>>> url = yarl.URL('http://name.abc.wxyz:1234/Assts/asset.epx?id=F3F94D94-7232-4FA2-98EF-07sdfssfdsa3B5')
>>> url.path_qs
'/Assts/asset.epx?id=F3F94D94-7232-4FA2-98EF-07sdfssfdsa3B5'

You could also use the builtin urllib.parse library but I find that it gets in the way once you start doing complex things like:

>>> url.update_query(asd='foo').with_fragment('asd/foo/bar')
URL('http://name.abc.wxyz:1234/Assts/asset.epx?id=F3F94D94-7232-4FA2-98EF-07sdfssfdsa3B5&asd=foo#asd/foo/bar')

edited Jun 14, 2019 at 14:00

answered Jul 17, 2013 at 9:11

Blender

300k55 gold badges462 silver badges511 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

Sangamesh Hs Over a year ago

@Blender what if i had to extract only the id from it for example 'F3F94D94-7232-4FA2-98EF-07sdfssfdsa3B5'. HOw do i implement it by regular expression?

Brett Lempereur Over a year ago

You don't need to use a regular expression. Implementing your own code to interact with standards when modules in the standard library exist is often a bad idea. The urlparse module has functions for turning a query string into a dictionary or a list of key-value pairs.

Marandil Over a year ago

Well, this answer has an issue. It is not a generic solution for many urls. Assume, the query string (the part after?) or even the part after # can contain forward slashes. Your solution will still split by it, returning the wrong answer.

Tom Over a year ago

This is a bad solution to getting parts of a URL. The urlparse library as mentioned by @TerryA is what should be used.

TerryA · Accepted Answer · 2013-07-17 09:22:17Z

10

You can use urlparse assuming asset.epx is the same:

>>> import urlparse
>>> url = 'http://name.abc.wxyz:1234/Assts/asset.epx?id=F3F94D94-7232-4FA2-98EF-07sdfssfdsa3B5'
>>> res = urlparse.urlparse(url)
>>> print 'asset.epx?'+res.query
asset.epx?id=F3F94D94-7232-4FA2-98EF-07sdfssfdsa3B5

This is useful if you ever need other information from the url (You can print res to check out the other info you can get ;))

If you're using Python 3 though, you'll have to do from urllib.parse import urlparse.

edited Jul 17, 2013 at 9:22

answered Jul 17, 2013 at 9:13

TerryA

60.2k11 gold badges122 silver badges148 bronze badges

Comments

Brett Lempereur · Accepted Answer · 2013-07-17 09:16:51Z

3

Depending on the version of Python, you want either urlparse in Python 2.x (http://docs.python.org/2/library/urlparse.html) or urllib.parse in Python 3.x (http://docs.python.org/2/library/urlparse.html). In Python 3 (all I have available), the following snippet achieves what you need without resorting to regular expressions:

import urllib.parse

address = "http://name.abc.wxyz:1234/Assts/asset.epx?id=F3F94D94-7232-4FA2-98EF-07sdfssfdsa3B5"
parsed = urllib.parse.urlsplit(address)
print("{}?{}".format(parsed.path.split("/")[-1], parsed.query)

The output is "asset.epx?id=F3F94D94-7232-4FA2-98EF-07sdfssfdsa3B5" here.

answered Jul 17, 2013 at 9:16

Brett Lempereur

8155 silver badges11 bronze badges

2 Comments

jdero Over a year ago

@SangameshHs If Brett's answer was what solved your problem, and this post reaches you, you should accept the vote as answering your question. It's the StackOverflow way :]

Sangamesh Hs Over a year ago

@jdero Brett's answer is 100% correct but Blender answered it first and when i click on accept answer it saysi have to wait 5 mins to accept the answer. So, now i did it!! cheers

Collectives™ on Stack Overflow

Extract a part of URL - python

3 Answers 3

4 Comments

Comments

2 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

4 Comments

Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related