1

I'm new to python,learning the basics.

My Query : I have multiple pages accessed as a request from a log file like the below,

"GET /img/home/search-user-ico.jpg HTTP/1.1"  
"GET /SpellCheck/am.tlx HTTP/1.1"
"GET /img/plan-comp-nav.jpg HTTP/1.1" 
"GET /ie6.css HTTP/1.1"
"GET /img/portlet/portlet-content-bg.jpg HTTP/1.1"
"GET /SpellCheck/am100k2.clx HTTP/1.1" 
"GET /SpellCheck/am.tlx HTTP/1.1" 

My question is i want only the file part from the page, For example, Let us consider "GET /img/home/search-user-ico.jpg HTTP/1.1" ,"GET /ie6.css HTTP/1.1" as a page then from the above i want to split search-user-ico.jpg HTTP, ie6.css HTTP.

so experts please help me in writing the python script for the above to split.

2
  • The HTTP part that follows both filenames is not part of the actual filename, are you sure you want to match that? Commented Apr 18, 2011 at 8:59
  • So you want "HTTP" in the output string but not the HTTP version? Commented Apr 18, 2011 at 8:59

4 Answers 4

3

Assuming that you don't have spaces in the filenames and that you don't want "HTTP" at the end.

You can split the line by space.

parts = line.split(" ")

and then use the os module to get the filename from the path.

filename = os.path.basename(parts[1])

For example.

>>> line = "GET /img/home/search-user-ico.jpg HTTP/1.1"
>>> parts = line.split(" ")
>>> parts[1]
'/img/home/search-user-ico.jpg'
>>> os.path.basename(parts[1])
'search-user-ico.jpg'
Sign up to request clarification or add additional context in comments.

2 Comments

As a one-liner, assuming entries is a list/tuple of the entries: filenames = [os.path.basename(entry.split()[1]) for entry in entries]
+1. @Jothi in case you've not seen it before @Blair has modified my code into a "List Comprehension" a very neat feature of Python.
1
data = [
"GET /img/home/search-user-ico.jpg HTTP/1.1",
"GET /SpellCheck/am.tlx HTTP/1.1",
"GET /img/plan-comp-nav.jpg HTTP/1.1" ,
"GET /ie6.css HTTP/1.1",
"GET /img/portlet/portlet-content-bg.jpg HTTP/1.1",
"GET /SpellCheck/am100k2.clx HTTP/1.1" ,
"GET /SpellCheck/am.tlx HTTP/1.1" 
]

for url in data:
    print url.split(' ')[1].split('/')[-2]

1 Comment

This will break in all kinds of edge cases. Also, it matches the second to last part of the middle string, not the last part.
0
data = [
"GET /img/home/search-user-ico.jpg HTTP/1.1",
"GET /SpellCheck/am.tlx HTTP/1.1",
"GET /img/plan-comp-nav.jpg HTTP/1.1" ,
"GET /ie6.css HTTP/1.1",
"GET /img/portlet/portlet-content-bg.jpg HTTP/1.1",
"GET /SpellCheck/am100k2.clx HTTP/1.1" ,
"GET /SpellCheck/am.tlx HTTP/1.1" 
]

for url in data:
    print url.split(' ')[1].split('/')[-1]

Comments

0

If the format of your links is similar. Another solution would be:

request = "GET /img/home/search-user-ico.jpg HTTP/1.1"
parts = request.split("/")
parts[-2] //returns search-user-ico.jpg HTTP

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.