Python script for URL split

Question

I'm new to python,learning the basics.

My Query : I have multiple pages accessed as a request from a log file like the below,

"GET /img/home/search-user-ico.jpg HTTP/1.1"  
"GET /SpellCheck/am.tlx HTTP/1.1"
"GET /img/plan-comp-nav.jpg HTTP/1.1" 
"GET /ie6.css HTTP/1.1"
"GET /img/portlet/portlet-content-bg.jpg HTTP/1.1"
"GET /SpellCheck/am100k2.clx HTTP/1.1" 
"GET /SpellCheck/am.tlx HTTP/1.1"

My question is i want only the file part from the page, For example, Let us consider "GET /img/home/search-user-ico.jpg HTTP/1.1" ,"GET /ie6.css HTTP/1.1" as a page then from the above i want to split search-user-ico.jpg HTTP, ie6.css HTTP.

so experts please help me in writing the python script for the above to split.

The HTTP part that follows both filenames is not part of the actual filename, are you sure you want to match that? — jilles de wit
– jilles de wit, Commented Apr 18, 2011 at 8:59
So you want "HTTP" in the output string but not the HTTP version? — Stephen Paulger
– Stephen Paulger, Commented Apr 18, 2011 at 8:59

Stephen Paulger · Accepted Answer · 2011-04-18 09:03:45Z

3

Assuming that you don't have spaces in the filenames and that you don't want "HTTP" at the end.

You can split the line by space.

parts = line.split(" ")

and then use the os module to get the filename from the path.

filename = os.path.basename(parts[1])

For example.

>>> line = "GET /img/home/search-user-ico.jpg HTTP/1.1"
>>> parts = line.split(" ")
>>> parts[1]
'/img/home/search-user-ico.jpg'
>>> os.path.basename(parts[1])
'search-user-ico.jpg'

answered Apr 18, 2011 at 9:03

Stephen Paulger

5,3733 gold badges31 silver badges46 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Blair Over a year ago

As a one-liner, assuming entries is a list/tuple of the entries: filenames = [os.path.basename(entry.split()[1]) for entry in entries]

Stephen Paulger Over a year ago

+1. @Jothi in case you've not seen it before @Blair has modified my code into a "List Comprehension" a very neat feature of Python.

Vihtinsky · Accepted Answer · 2011-04-18 09:33:47Z

1

data = [
"GET /img/home/search-user-ico.jpg HTTP/1.1",
"GET /SpellCheck/am.tlx HTTP/1.1",
"GET /img/plan-comp-nav.jpg HTTP/1.1" ,
"GET /ie6.css HTTP/1.1",
"GET /img/portlet/portlet-content-bg.jpg HTTP/1.1",
"GET /SpellCheck/am100k2.clx HTTP/1.1" ,
"GET /SpellCheck/am.tlx HTTP/1.1" 
]

for url in data:
    print url.split(' ')[1].split('/')[-2]

answered Apr 18, 2011 at 9:33

Vihtinsky

433 bronze badges

1 Comment

jilles de wit Over a year ago

This will break in all kinds of edge cases. Also, it matches the second to last part of the middle string, not the last part.

Achim · Accepted Answer · 2011-04-18 09:05:28Z

0

data = [
"GET /img/home/search-user-ico.jpg HTTP/1.1",
"GET /SpellCheck/am.tlx HTTP/1.1",
"GET /img/plan-comp-nav.jpg HTTP/1.1" ,
"GET /ie6.css HTTP/1.1",
"GET /img/portlet/portlet-content-bg.jpg HTTP/1.1",
"GET /SpellCheck/am100k2.clx HTTP/1.1" ,
"GET /SpellCheck/am.tlx HTTP/1.1" 
]

for url in data:
    print url.split(' ')[1].split('/')[-1]

answered Apr 18, 2011 at 9:05

Achim

15.7k15 gold badges92 silver badges161 bronze badges

Comments

omerkirk · Accepted Answer · 2011-04-18 09:11:03Z

0

If the format of your links is similar. Another solution would be:

request = "GET /img/home/search-user-ico.jpg HTTP/1.1"
parts = request.split("/")
parts[-2] //returns search-user-ico.jpg HTTP

answered Apr 18, 2011 at 9:11

omerkirk

2,5271 gold badge17 silver badges9 bronze badges

Collectives™ on Stack Overflow

Python script for URL split

4 Answers 4

2 Comments

1 Comment

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

2 Comments

1 Comment

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related