1

Hey guys I'm looking for a regular expression which will 'parse' a line of the Common Log Format standard and will give me the 7 variables from it:

  • IP
  • identity
  • username
  • time
  • request
  • status
  • size of the object.

Has anybody already implemented this regex?

1

3 Answers 3

1

I would just get the time and request first, then it is just a simple split:

a = '127.0.0.1 user-identifier frank [10/Oct/2000:13:55:36 -0700] "GET /apache_pb.gif HTTP/1.0" 200 2326'

time    = a.slice!(/\[.*?\]/)
request = a.slice!(/".*"/)
ip, identity, username, status, size = a.split
Sign up to request clarification or add additional context in comments.

1 Comment

really simple and solves the problem, thanks @hirolau
1

Input:

127.0.0.1 user-identifier frank [10/Oct/2000:13:55:36 -0700] "GET /apache_pb.gif HTTP/1.0" 200 2326

Regex:

(\S+)\s+(\S+)\s+(\S+)\s+(\[.*?\])\s+(".*?")\s+(\S+)\s+(\S+)

Where the capture groups are numbered as in the breakdown below.

Breakdown:

Group         Regex         Match
#1 IP         (\S+)         127.0.0.1
#2 Identity   (\S+)         user-identifier
#3 Username   (\S+)         frank
#4 Time       (\[.*?\])     [10/Oct/2000:13:55:36 -0700]
#5 Request    (".*?")       "GET /apache_pb.gif HTTP/1.0" 
#6 Status     (\S+)         200
#7 Size       (\S+)         2326
each separated by a \s+

Comments

0

I also came up with my own regex wich gives also splits the verb, uri and HTTP version.

^([\d\.]*)\s([\w|-]*)\s([\w|-]*)\s\[(.*)\]\s\"([\w]*)\s(.*)\s(.*)\"\s([\d]*)\s([\d]*)$

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.