1

Suppose i had string as below

data = "/phones/pages/nokia_overview.aspx pid=46&cid=raj 80"

Now i want to fetch the url from the string from / up to .aspx . I mean actually there may be many strings as above, but we want to fetch the string from start and up to the extension .aspx in the string by ignoring the remaining part of the string after .aspx

Here the length of the string may vary, because sometimes the url will be bigger and sometimes small, so based on .aspx extension i want to fetch the url from the above string

can anyone let me know how can we do this in python

5
  • 1
    Will there be an empty space after every URL? Commented Nov 7, 2012 at 5:19
  • @arumr:Actually these are the strings/lines that i got from log files, i am trying to parse the log files with hadoop and each line is in the format 2012-11-04 23:00:07 10.1.151.54 GET /pages/index.aspx - 80 - 10.1.151.59 - 200 0 64 374. So please consider them as a string(in this case) Commented Nov 7, 2012 at 5:23
  • @RocketDonkey:yes i will tell clearly now after all splitting the entire string from log file i got this "/phones/pages/nokia_overview.aspx pid=46&cid=raj 80". So from this i want to fetch only up to the extension .aspx.As you know during processing the log file sometimes i am getting the string as only ' / ' so re match and group not working. FYI i will the string with different extension like "/_layouts/1033/styles/css/pages.css". so i want to fetch only the strings of/up to .aspx extension. Commented Nov 7, 2012 at 5:49
  • Take a look at @BurhanKhalid's suggestion - that is a much simpler/elegant way of achieving it than what I was proposing. I would say give that one a shot if possible :) Commented Nov 7, 2012 at 5:52
  • what i am telling is i already got some string by performing some operations according to some requirement as "/phones/pages/nokia_overview.aspx pid=46&cid=raj 80" from the complete string i mentioned before ok. My intention is to fetch the required string up to .aspx from the above string i mentioned in the question. Commented Nov 7, 2012 at 5:56

2 Answers 2

4

Since this is a standard log format, you can do this:

>>> s = "2012-11-04 23:00:07 10.1.151.54 GET /pages/index.aspx - 80 - 10.1.151.5
9 - 200 0 64 374"
>>> s.split()[4]
'/pages/index.aspx'

I already got some string by performing some operations according to some requirement as /phones/pages/nokia_overview.aspx pid=46&cid=raj 80 from the complete string I mentioned before ok. My intention is to fetch the required string up to .aspx from the above string I mentioned in the question

>>> s = "/phones/pages/nokia_overview.aspx pid=46&cid=raj 80"
>>> s.split()[0]
'/phones/pages/nokia_overview.aspx'
Sign up to request clarification or add additional context in comments.

5 Comments

@shivakrishna Actually disregard my answer completely - this is the one you want :) +1
@RocketDonkey:I already got some string like "/phones/pages/nokia_overview.aspx pid=46&cid=raj 80" after some operations of my requirement, this is the actual string i want from which i want to fetch. The previous one i given is complete string before i performed some operations and given just as an example
@shivakrishna Gotcha - assuming that is the case, you can still use Burhan's method of splitting the string on the space and taking the first element. Would that work for your situation?
actually the length of the string varies and some times more words would be added/sometimes it is empty.So after all these scenarios finally i got the final string as "/phones/pages/nokia_overview.aspx pid=46&cid=raj 80". From this we need to perform two operation 1. Fetching the required string up to ".aspx"(the lenght of the url may vary so need to use re or indexing concept) 2. Some times during processing the url is empty so re should write in that scenarios.
If there is no URL, but there is a space, you'll get just the space that is returned. It doesn't matter how long the URL is or if it has a query string; the code splits on spaces.
1

Simple function to cut from first / to next ' '

def pathPart(s):
    pos_slash = s.find('/')
    if pos_slash < 0: pos_slash = len(s)
    pos_space = s.find(' ', pos_slash)
    if pos_space < 0: pos_space = len(s)
    return s[pos_slash : pos_space]

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.