2

I have to read a huge xml file which consists of over 3 million records and over 10 million nested elements.

Naturally I am using xmltextreader and have got my parsing time down to about 40 seconds from earlier 90 seconds using multiple optimization tricks and tips.

But I want to further save processing time as much as I can hence below question.

Quite a few elements are of type xs:boolean and the data provider always represents values as "true" or "false" - never "1" or "0".

For such cases my earliest code was:

if (xmlTextReader.Value == "true")
{
    bool subtitled = true;
}

which i further optimized to:

if (string.Equals(xmlTextReader.Value, "true", StringComparison.OrdinalIgnoreCase))
{
    bool subtitled = true;
}

I wanted to know if below would be fastest (because its either "true" or "false")?

if (xtr.value.length == 4)
{
    bool subtitled = true;
}
3
  • Why don't you benchmark the two approaches and see for yourself? (For what it's worth, I'd guess that the length comparison would be quicker, but probably not significantly.) Commented Sep 6, 2010 at 14:00
  • 1
    Why not just test it? I would not be surprised if string.Equals short-circuited it's test on a length comparison anyway. It would check 1st for reference equality, then the length of the two strings, then if length are the same perform character by character test. Just a guess. Commented Sep 6, 2010 at 14:03
  • 1
    @Chris Taylor: Equals does this short-circuit only for Ordinal and OrdinalIgnoreCase. In all the others, "\x00e9".Equals("e\x0301") is true despite being different length. Commented Sep 6, 2010 at 14:20

8 Answers 8

7

Yes, it is faster, because you only compare exactly one value, namely the length of the string.

By comparing two strings with each other, you compare each and every character, as long as both characters are the same. So if you're finding a match for the string "true", you're going to do 4 comparisons before the predicate evaluates to true.

The only problem you have with this solution is, that if someday the value is going to change from true to let's say 1, you're going to run into a problem here.

Sign up to request clarification or add additional context in comments.

4 Comments

It's actually not a problem: you'd have the same failing comparisons when "1" != "true" and "0" != "false". You can't change one half interface implementation and expect the interface and all other implementations of that interface to magically change. See also Postel's Law: Be conservative in what you send; be liberal in what you accept.
@Msalters but the length of "1" and the length of "0" is the same so you can't determine the value based on the length.
But is it faster than xmlTextReader.Value[0] == 't'? Just wanted to raise the question of cause the right thing to do is to benchmark
@RuneFS: that's besides the point. If the interface says to use "true" and "false", then it works. If the interface is changed to ujse "0" and "1", then String.Equal will work. But if the interface would be changed to use only "false", with true being the implied default if there is no element in your XML, then your string comparison breaks. Ergo, you can't speculate whether your parsing algorithm will understand a future protocol version, and you must consider that incompatible by default.
4

Comparing length will be faster, but less readable. I wouldn't use it unless I profile the performance of the code and conclude that I need this optimization.

Comments

3

What about comparing the first character to "t"?

Should (maybe :) be faster than comparing the whole string..

Comments

2

Measuring the length would almost invariably be faster. That said, unless this is an experiment in micro-optimization, I'd just focus on making the code to be readable and convey the proper semantics.

You might also try something like that uses the following approach:

Boolean.TryParse(xmlTextReader.Value, out subtitled)

I know that has nothing to do with your question, but I figured I'd throw it out there anyway.

Comments

0

Cant you just write a unit test? Run each scenario for example 1000 times and compare the datetimes.

Comments

0

If you know it's either "true" or "false", the last snippet must be fastest.

Anyway, you can also write:

bool subtitled = (xtr.Value.length == 4);

That should be even faster.

Comments

0

Old question I know but the accepted answer is wrong, or at least, incorrect in it's explanation.

Comparing the lengths maybe be the slightest bit faster but only because string.Equals is likely doing some other comparisons before it too checks the lengths and decides that they are not equal strings.

So in practice this is an optimization of last resort.

Here you can find the source for .NET core string comparison.

Comments

-1

String comparing and parsing is very slow in .Net, I'd recommend avoid intensive using string parsing/comparing in .Net.

If you're forced to do it -- use highly optimized unmanaged or unsafe code and use parallelism.

IMHO.

5 Comments

Any links to support this claim?
-1: Out of context, or especially in this context, this statement is already questionable. But even with the benefit of doubt, sorry, but without any evidence/facts/experience whatsoever, this is nothing but FUD.
Just write simple tests and see it yourself. I did it. Write, for instance, string comparsion, char access, for example atoi() and Convert.ToInt32() in c/c++ and c# and you will see that the native unmanaged code hundreds time more efficient.
Using out of context micro benchmarks - maybe yes. However, you give a pretty bold recommendation in your answer to use unsafe or unmanaged code. Which, as such, is not necessarly bad, but should only - if even possible (think portability, mobile, silverlight, medium-trust environments, etc.) - used after really figuring out if that particular part of the process is really the bottleneck, compared to the other heavylifting that goes one. Regarding the original question I would assume the XML parsing to have the lion's share. Besides the question was about string-to-string-comparison anyway.
Completely agree with you, optimization is not necessarily low-level tuning and hacks such as using unsafe code instead BCF routines. My point was the .NET string performance itself. Sorry for misunderstanding.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.