1

I'm analyzing a Apache access log file and I want to find the hit count.
Following code does it.:

ips = df.groupby('IP').size()
ips.sort()
print ips[-10:]

But I want to find the "Referrer" (the 9th column) of the top 10 ips.
How can I do this?

Sample log file line:

112.135.128.20 - [13/May/2013:23:55:04 +0530] "GET /SVRClientWeb/ActionController HTTP/1.1" 302 2 "https://www.example.com/sample" "Mozilla/5.0 (iPhone; CPU iPhone OS 6_1_3 like Mac OS X) AppleWebKit/536.26 (KHTML, like Gecko) Mobile/10B329" GET /SVRClientWeb/ActionController - HTTP/1.1 www.test.com 

1 Answer 1

1

Use isin

You can first sort your ips and get the last 10 ips' index.

ips.sort()
top_ips = ips.tail(10).index

And then use isin to get the referrers you need:

referrers = df[df['IP'].isin(top_ips)]['Referrer']
Sign up to request clarification or add additional context in comments.

7 Comments

The results using ips[-10:] and ips.head(10).index are different?And also it gives an error at ips.sort(ascending=False)....TypeError: sort() got an unexpected keyword argument 'ascending'
Which version are you using?
pandas : 0.11.0 and python 2.7
head is to get the first n rows.
And pandas series does have the method sort. Did I miss anything?
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.