Fetching only required part from a string in python

Question

Suppose i had string as below

data = "/phones/pages/nokia_overview.aspx pid=46&cid=raj 80"

Now i want to fetch the url from the string from / up to .aspx . I mean actually there may be many strings as above, but we want to fetch the string from start and up to the extension .aspx in the string by ignoring the remaining part of the string after .aspx

Here the length of the string may vary, because sometimes the url will be bigger and sometimes small, so based on .aspx extension i want to fetch the url from the above string

can anyone let me know how can we do this in python

@arumr:Actually these are the strings/lines that i got from log files, i am trying to parse the log files with hadoop and each line is in the format 2012-11-04 23:00:07 10.1.151.54 GET /pages/index.aspx - 80 - 10.1.151.59 - 200 0 64 374. So please consider them as a string(in this case) — Shiva Krishna Bavandla
– Shiva Krishna Bavandla, Commented Nov 7, 2012 at 5:23
@RocketDonkey:yes i will tell clearly now after all splitting the entire string from log file i got this "/phones/pages/nokia_overview.aspx pid=46&cid=raj 80". So from this i want to fetch only up to the extension .aspx.As you know during processing the log file sometimes i am getting the string as only ' / ' so re match and group not working. FYI i will the string with different extension like "/_layouts/1033/styles/css/pages.css". so i want to fetch only the strings of/up to .aspx extension. — Shiva Krishna Bavandla
– Shiva Krishna Bavandla, Commented Nov 7, 2012 at 5:49
Take a look at @BurhanKhalid's suggestion - that is a much simpler/elegant way of achieving it than what I was proposing. I would say give that one a shot if possible :) — RocketDonkey
– RocketDonkey, Commented Nov 7, 2012 at 5:52
what i am telling is i already got some string by performing some operations according to some requirement as "/phones/pages/nokia_overview.aspx pid=46&cid=raj 80" from the complete string i mentioned before ok. My intention is to fetch the required string up to .aspx from the above string i mentioned in the question. — Shiva Krishna Bavandla
– Shiva Krishna Bavandla, Commented Nov 7, 2012 at 5:56

Burhan Khalid · Accepted Answer · 2012-11-07 08:11:00Z

4

Since this is a standard log format, you can do this:

>>> s = "2012-11-04 23:00:07 10.1.151.54 GET /pages/index.aspx - 80 - 10.1.151.5
9 - 200 0 64 374"
>>> s.split()[4]
'/pages/index.aspx'

I already got some string by performing some operations according to some requirement as /phones/pages/nokia_overview.aspx pid=46&cid=raj 80 from the complete string I mentioned before ok. My intention is to fetch the required string up to .aspx from the above string I mentioned in the question

>>> s = "/phones/pages/nokia_overview.aspx pid=46&cid=raj 80"
>>> s.split()[0]
'/phones/pages/nokia_overview.aspx'

edited Nov 7, 2012 at 8:11

answered Nov 7, 2012 at 5:35

Burhan Khalid

175k20 gold badges254 silver badges291 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

RocketDonkey Over a year ago

@shivakrishna Actually disregard my answer completely - this is the one you want :) +1

Shiva Krishna Bavandla Over a year ago

@RocketDonkey:I already got some string like "/phones/pages/nokia_overview.aspx pid=46&cid=raj 80" after some operations of my requirement, this is the actual string i want from which i want to fetch. The previous one i given is complete string before i performed some operations and given just as an example

RocketDonkey Over a year ago

@shivakrishna Gotcha - assuming that is the case, you can still use Burhan's method of splitting the string on the space and taking the first element. Would that work for your situation?

Shiva Krishna Bavandla Over a year ago

actually the length of the string varies and some times more words would be added/sometimes it is empty.So after all these scenarios finally i got the final string as "/phones/pages/nokia_overview.aspx pid=46&cid=raj 80". From this we need to perform two operation 1. Fetching the required string up to ".aspx"(the lenght of the url may vary so need to use re or indexing concept) 2. Some times during processing the url is empty so re should write in that scenarios.

Burhan Khalid Over a year ago

If there is no URL, but there is a space, you'll get just the space that is returned. It doesn't matter how long the URL is or if it has a query string; the code splits on spaces.

Catalin Popescu · Accepted Answer · 2012-11-07 06:01:15Z

1

Simple function to cut from first / to next ' '

def pathPart(s):
    pos_slash = s.find('/')
    if pos_slash < 0: pos_slash = len(s)
    pos_space = s.find(' ', pos_slash)
    if pos_space < 0: pos_space = len(s)
    return s[pos_slash : pos_space]

answered Nov 7, 2012 at 6:01

Catalin Popescu

311 bronze badge

Collectives™ on Stack Overflow

Fetching only required part from a string in python

2 Answers 2

5 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

5 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related