0

Okay so I'm processing searchlogs with a java class, but at some point I stumbled upon a tricky part in the logs:

generally, the log lines look like this:

217 yahoo.com   2006-05-16 16:35:31     

With the first number being the user id, the string after that being the query and the timestamp afterwards. So far so good, I managed to extract the userid and used .split(':') and split('-') to get the parts of the timestamp. But further down in the log the composition of the lines gets a bit unpleasant - for example, there are lines like the following:

217 -   2006-05-18 18:20:10 1   http://www.theonering.net
1268    osteen-schatzberg.com   2006-03-21 17:55:42 1   http://www.osteen-schatzberg.com

In the first line, the '-' seems to mark an empty query or w/e, and the url at the end is marked as 'clickurl'. With lines like those, my idea of using split() to recieve the timestamp (and also the query) went to hell...

Does anyone have a good idea how to approach this problem?

Thanks in advance

2
  • could you not separate based on whitespace and then process the pieces separately? (id, query, date, time, clickurl) Commented Apr 25, 2012 at 17:46
  • wow did I really not come to think of that? I'll try it later and see how it goes, but sounds okay. As far as I can tell all the parts are correctly separated by whitespaces Commented Apr 25, 2012 at 17:50

3 Answers 3

2

You should really look into using pattern matching with regular expressions here.

Here is a potentially useful example.

Sign up to request clarification or add additional context in comments.

Comments

1

What if you split the string by spaces first? Example (pseudocode):

 string = nextLineInFile;
 string[] data = (fill with nextLineInFile broken up by spaces/whitespace);

Now data[0] holds the user id, data[1] holds the query, etc.

Comments

1

There is no such thing as a general solution. It appears that your lines follow the pattern of

So you could split things up by spaces and go from there...

1 Comment

that's pretty much what Ross already commented, I'll look into it. Guess it's the only way here

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.