4

I have an URL for example:

http://name.abc.wxyz:1234/Assts/asset.epx?id=F3F94D94-7232-4FA2-98EF-07sdfssfdsa3B5

From this Url I want to extract only 'asset.epx?id=F3F94D94-7232-4FA2-98EF-07sdfssfdsa3B5' how could i do that?

I am still learning regular expressions and I am not able to solve the above. Any suggestions would be appreciated.

3
  • Just for this url or there are others? Commented Jul 17, 2013 at 9:09
  • Is asset.epx always the same? Commented Jul 17, 2013 at 9:10
  • @Haidro as per the data i have it is always the same!! but i think solved the issue..Thanks for your time! Commented Jul 17, 2013 at 9:13

3 Answers 3

10

In this specific example splitting the string is enough:

url.split('/')[-1]

If you have a more complex URL I would recommend the yarl library for parsing it:

>>> import yarl  # pip install yarl
>>> url = yarl.URL('http://name.abc.wxyz:1234/Assts/asset.epx?id=F3F94D94-7232-4FA2-98EF-07sdfssfdsa3B5')
>>> url.path_qs
'/Assts/asset.epx?id=F3F94D94-7232-4FA2-98EF-07sdfssfdsa3B5'

You could also use the builtin urllib.parse library but I find that it gets in the way once you start doing complex things like:

>>> url.update_query(asd='foo').with_fragment('asd/foo/bar')
URL('http://name.abc.wxyz:1234/Assts/asset.epx?id=F3F94D94-7232-4FA2-98EF-07sdfssfdsa3B5&asd=foo#asd/foo/bar')
Sign up to request clarification or add additional context in comments.

4 Comments

@Blender what if i had to extract only the id from it for example 'F3F94D94-7232-4FA2-98EF-07sdfssfdsa3B5'. HOw do i implement it by regular expression?
You don't need to use a regular expression. Implementing your own code to interact with standards when modules in the standard library exist is often a bad idea. The urlparse module has functions for turning a query string into a dictionary or a list of key-value pairs.
Well, this answer has an issue. It is not a generic solution for many urls. Assume, the query string (the part after?) or even the part after # can contain forward slashes. Your solution will still split by it, returning the wrong answer.
This is a bad solution to getting parts of a URL. The urlparse library as mentioned by @TerryA is what should be used.
10

You can use urlparse assuming asset.epx is the same:

>>> import urlparse
>>> url = 'http://name.abc.wxyz:1234/Assts/asset.epx?id=F3F94D94-7232-4FA2-98EF-07sdfssfdsa3B5'
>>> res = urlparse.urlparse(url)
>>> print 'asset.epx?'+res.query
asset.epx?id=F3F94D94-7232-4FA2-98EF-07sdfssfdsa3B5

This is useful if you ever need other information from the url (You can print res to check out the other info you can get ;))

If you're using Python 3 though, you'll have to do from urllib.parse import urlparse.

Comments

3

Depending on the version of Python, you want either urlparse in Python 2.x (http://docs.python.org/2/library/urlparse.html) or urllib.parse in Python 3.x (http://docs.python.org/2/library/urlparse.html). In Python 3 (all I have available), the following snippet achieves what you need without resorting to regular expressions:

import urllib.parse

address = "http://name.abc.wxyz:1234/Assts/asset.epx?id=F3F94D94-7232-4FA2-98EF-07sdfssfdsa3B5"
parsed = urllib.parse.urlsplit(address)
print("{}?{}".format(parsed.path.split("/")[-1], parsed.query)

The output is "asset.epx?id=F3F94D94-7232-4FA2-98EF-07sdfssfdsa3B5" here.

2 Comments

@SangameshHs If Brett's answer was what solved your problem, and this post reaches you, you should accept the vote as answering your question. It's the StackOverflow way :]
@jdero Brett's answer is 100% correct but Blender answered it first and when i click on accept answer it saysi have to wait 5 mins to accept the answer. So, now i did it!! cheers

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.