I'm working on comparing a large number of strings (log entries) for verifying that some system results have not changed. My first attempt was direct enough, just using an .equals() comparison. If the .equals() failed.
This works only as long as my system results are recorded the same day. Part of the data includes accessed time stamps. I don't mind if the accessed dates are different, it is the rest of the payload I'm concerned about.
As an example from a small part of the strings I'm comparing:
...3X68 : accessed 14 Oct 2014 : from quo...
...3X68 : accessed 16 Oct 2014 : from quo...
The strings have multiple cases of the "accessed dd MMM yyyy" tags that I want to ignore, usually around 5-10 but in some cases the data can be several hundred kilobytes with several hundred blocks, each with a copy of the accessed stamp. (Yes, removing the redundancy is on the list.)
I've tried several attempts at mismatching with the regex "accessed \d\d ... \d\d\d\d", but since the substrings may appear multiple times, I'm ending up writing the regex tests with several numbers of those searches, but since it may appear potentially a hundred or more times, that is quickly impractical.
What are some better ways to run this kind of string compare with exception? Either directly, or by leveraging a library?