3

I have a program that compares two files. I ran visual studio analysis and found that my comparison time is large. Is there a quicker way to compare two string than this? (I can't use parallel foreach because it might causes errors.) Right now I'm using a concurrent dictionary but I'm open to other options. :)

var metapath = new ConcurrentDictionary<string, string>();
foreach(var me in metapath)
{
 if (line.StartsWith(me.Key.ToString()))
 {...}
}
13
  • How large is the comparison time, does it say? Commented Mar 23, 2012 at 21:57
  • Do you need the line-based approach? It’s not entirely obvious from your question. Do you just want to compare entire files for equality, or the lines of individual text files? Commented Mar 23, 2012 at 22:00
  • @BoltClock well me.key.tostring is 8 characters long and line is somewhere between 200-1000 and its taking about 42 seconds for all the comparisions Commented Mar 23, 2012 at 22:00
  • @romkyns yes I think it needs to be line based Commented Mar 23, 2012 at 22:01
  • Sounds similar to this question: stackoverflow.com/q/8867710/409259 Commented Mar 23, 2012 at 22:02

3 Answers 3

5

First of all, drop the ToString() from me.Key.ToString().

Next, use the ordinal string comparison (provided that this doesn’t impact correctness):

line.StartsWith(me.Key, StringComparison.Ordinal);

This is beneficial because standard string comparisons follow various Unicode rules on what’s equal. For example, normalized and denormalized sequences must be treated as equal. Ordinal just compares raw character data, ignoring Unicode equality rules. There is more detail on this here, for example, or here (which claims it’s faster but without quoting any numbers).

Last, profile the code. You’ll be surprised, but most of the time the slow part is not at all what you think it is. For example, it could be the part where you add things to the dictionary.

Sign up to request clarification or add additional context in comments.

1 Comment

Can you explain why this would be beneficial?
1

If you compare strings exactly, String.Equals is quite good:

String.Equals(line, me.Key)

Have you seen this: What is the fastest (built-in) comparison for string-types in C#

1 Comment

sorry but i'm not exactly comparing just the first 8 characters
0

It's not clear exactly what you mean by "comparision" but if you don't mean "sort" i.e. you want to check for plagiarism or something, then what about hashing the lines first and comparing the hash?

It would depend on the size of your data set as to whether there is any benefit. Large and small are highly subjective terms.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.