0

I have some raw data I scraped from a log file, which currently reads as:

"   80:  0.20%:  2/Jan/14 21:01: /site/podcasts/audio/2013/podcast-07-15-2013.mp3", 
"   71:  0.16%:  14/Jan/14 12:18: /site/podcasts/audio/2013/podcast-11-04-2013.mp3", 
"   67:  0.17%:  2/Jan/14 23:44: /site/podcasts/audio/podcast-3-21-2011.mp3", 
"   67:  0.15%:  15/Jan/14 09:25: /site/podcasts/audio/2013/podcast-08-05-2013.mp3", 
"   64:  0.12%:  2/Jan/14 07:40: /site/podcasts/audio/2013/podcast-11-04-2013-1.mp3",

I need to convert gather three pieces of information into data for an Excel spreadsheet -- the number before the intitial colon, the date, and the URL. So if I converted it into CSV, it would read as

80, 2/Jan/14, /site/podcasts/audio/2013/podcast-07-15-2013.mp3
71, 14/Jan/14, /site/podcasts/audio/2013/podcast-11-04-2013.mp3
67, 2/Jan/14, /site/podcasts/audio/podcast-3-21-2011.mp3

And so on. However, I'm having trouble figuring out how to do that. I wrote some regexes to capture the right data, but I'm not sure how to convert those regexes into what I need.

There's this regex to get the first number: ^"\s{3}(\d+)

And this regex could get the date: (\d+\/\w{3}\/14)

And this regex could get the URL: (\/site\/podcasts\/audio\/.*\.mp3)

However, I'm not sure how to take these regexes and convert them into the CSV I need. Any ideas?

3
  • 1
    Does your log file actually have those quotes and commas in it? Commented Aug 21, 2014 at 14:06
  • Hi @sawa, yes, it's an array (I removed the brackets). I noticed that before you had posted what seemed like an elegant solution to this problem that's no longer there -- unfortunately, I haven't been able to look into testing it before now. Is there a reason why it's been deleted? Commented Aug 22, 2014 at 17:09
  • And @jkillian, no, log the file does not. The data above is what I scraped from the log file with my Ruby script. Commented Aug 22, 2014 at 17:10

3 Answers 3

1

I personally wouldn't use regular expressions:

output = ''
File.open("path/to/log", "r") do |f|
  f.each_line do |line|
    num, percent, date, time, url = line.split(/\s+/)
    num = num[0..-2]  # removes the colon from the end of the number
    output << "#{num}, #{date}, #{url}\n"
  end
end

# do whatever you want with the result
puts output

And this prints:

80, 2/Jan/14, /site/podcasts/audio/2013/podcast-07-15-2013.mp3
71, 14/Jan/14, /site/podcasts/audio/2013/podcast-11-04-2013.mp3
67, 2/Jan/14, /site/podcasts/audio/podcast-3-21-2011.mp3
67, 15/Jan/14, /site/podcasts/audio/2013/podcast-08-05-2013.mp3
64, 2/Jan/14, /site/podcasts/audio/2013/podcast-11-04-2013-1.mp3

There are shorter, more clever ways to do this, but I like this way because it's readable and clear.

Sign up to request clarification or add additional context in comments.

4 Comments

Hi @jkillian, this seems like an excellent solution -- thank you. However, I am getting outputs in the format of , 0.20%:, 21:01: and so on -- so it's pulling the wrong pieces of data from each line. I'm trying to figure it out, but do you have an idea of why?
@CodeBiker Your log format might be a little different that what I expected... Add in puts line above the line with split and give me an example of what a line looks like
This seems to work, although I removed the part about the colon because sometimes that number has more than two digits: File.open("log", "r") do |f| f.each_line do |line| quote, num, percent, date, time, url = line.split(/\s+/) output << "#{num}, #{date}, #{url}\n" end end
@CodeBiker Great, glad it worked for you! Actually though, it doesn't matter how many digits the number has: The 0..-2 range includes the characters from the first character (0) to the second to last character (-2) inclusive. So that line basically just trims off the last character no matter what.
1

This puts your matches together and in capture groups that you can then later handle in Ruby. I'm unfamiliar with Ruby but I imagine you can concatenate the strings that the capture-groups return.

^"\s{3}(\d+)(?:[\s:]|\d\.\d\d%)*(\d+\/\w{3}\/14)[\s\d:]*(\/site\/podcasts\/audio\/.*\.mp3)

Regular expression visualization

Debuggex Demo

1 Comment

I marked the above jkillian's answer as correct because it got me to the right information. However, your regex is excellent -- the problem is that I'm not familiar enough with Ruby to know how to concatenate the strings from the capture groups.
1
\s+(\d+):\s+.*?(\d+\/\w+\/\d+)\s+.*?(\/.*?)\".*

Try this.Please look at the demo.

http://regex101.com/r/cA4wE0/10

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.