Okay so I'm processing searchlogs with a java class, but at some point I stumbled upon a tricky part in the logs:
generally, the log lines look like this:
217 yahoo.com 2006-05-16 16:35:31
With the first number being the user id, the string after that being the query and the timestamp afterwards. So far so good, I managed to extract the userid and used .split(':') and split('-') to get the parts of the timestamp. But further down in the log the composition of the lines gets a bit unpleasant - for example, there are lines like the following:
217 - 2006-05-18 18:20:10 1 http://www.theonering.net
1268 osteen-schatzberg.com 2006-03-21 17:55:42 1 http://www.osteen-schatzberg.com
In the first line, the '-' seems to mark an empty query or w/e, and the url at the end is marked as 'clickurl'. With lines like those, my idea of using split() to recieve the timestamp (and also the query) went to hell...
Does anyone have a good idea how to approach this problem?
Thanks in advance