1

I'm working on comparing a large number of strings (log entries) for verifying that some system results have not changed. My first attempt was direct enough, just using an .equals() comparison. If the .equals() failed.

This works only as long as my system results are recorded the same day. Part of the data includes accessed time stamps. I don't mind if the accessed dates are different, it is the rest of the payload I'm concerned about.

As an example from a small part of the strings I'm comparing:

...3X68 : accessed 14 Oct 2014 : from quo... 
...3X68 : accessed 16 Oct 2014 : from quo...  

The strings have multiple cases of the "accessed dd MMM yyyy" tags that I want to ignore, usually around 5-10 but in some cases the data can be several hundred kilobytes with several hundred blocks, each with a copy of the accessed stamp. (Yes, removing the redundancy is on the list.)

I've tried several attempts at mismatching with the regex "accessed \d\d ... \d\d\d\d", but since the substrings may appear multiple times, I'm ending up writing the regex tests with several numbers of those searches, but since it may appear potentially a hundred or more times, that is quickly impractical.

What are some better ways to run this kind of string compare with exception? Either directly, or by leveraging a library?

1 Answer 1

2

There are surely many ways to approach this problem. Given that String.equals() comparisons serve your purpose except for the timestamp issue, a relatively straightforward way to go would be to strip the timestamps from both input and comparison data, and use String.equals() to compare the parts you care about (i.e. whatever is left). You can use a regex to do the stripping:

Pattern tsPattern = Pattern.compile("accessed \d\d ... \d\d\d\d");
Matcher m = tsPattern.matcher(input);
String stripped = m.replaceAll("");

You can also do that with String.replaceAll(), but if you're doing a lot of those replacements then going with a Matcher is cheaper because you can do

m.reset(nextInput);

to avoid re-compiling the regex for each string.

Sign up to request clarification or add additional context in comments.

1 Comment

Appreciate it. That approach worked. Also, good to know about the matcher doing the replacement, I was concerned about the memory requirements of potentially splitting out bunches of substring segments. With several hundred megabytes of logs in some cases, I was worried about Java's string memory issues. So far memory hasn't exploded too badly.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.