4

Joe Duffy's Blog implies using string.Substring is more efficient than string.Split.

I don't know if its saying the Substring method does not allocate a new string or if it is just more efficient because it does not make any unneeded allocations. Can you please explain how it is more efficient and show an example.

I understand his first example as creating an array and then processing each of the strings in the array.

string str = ...;
string[] substrs = str.Split(',');
foreach (string subtr in substrs) {
Process(substr);
}

How is the following more efficient

string str = ...;
int lastIndex = 0;
int commaIndex;
while ((commaIndex = str.IndexOf(',', commaIndex)) != -1) {
    Process(substr, lastIndex, commaIndex);
    lastIndex = commaIndex + 1;

What I see is using String.IndexOf to find the index of the comma then processing the string. I assume he intends to use String.Substring to extract the data during his processing. One of the comments below suggested he may be pulling it character by character. Would he be pulling characters until he hits the next comma possibly building up an array of char?

6
  • Yes sting.Index of is used to find the Index of the comma but inside of your hypothetical process method your still going to need to make a substring to extract the data within those indexes. right? Commented Jul 31, 2013 at 2:39
  • I stand corrected - the example is indeed confusing Commented Jul 31, 2013 at 2:41
  • "There are landmine APIs lurking out there, like String.Split and String.Substring" implies he thinks Substring is not efficient. And I think that "String does, after all, have an indexer. And it’s type-safe! So in-place parsing at least won’t lead to buffer overruns" implies he intends you to access the substring character by character, rather than using Substring. Commented Jul 31, 2013 at 2:52
  • 1
    @GrimR3: You don't need to make a substring to access the data, but everything you'll want to do will be more complicated if you don't. You'll no longer have access to most of the stuff that makes strings useful in the first place, like Trim*, StartsWith, EndsWith, and straightforward comparisons in general. (For example, substr == "stuff" becomes (end - start) == 5 && String.Compare(str, start, "stuff", 0, 5) == 0.) You'd have to do all that yourself, and you'd probably mess up quite a bit along the way. Note that the author's own example doesn't compile. :P Commented Jul 31, 2013 at 5:11
  • @Blorgbeard: Substring isn't as efficient as it could be. The code i saw seems to indicate that it always makes a new array containing a copy of the characters, rather than referencing the original string's array using a range like Java does. The drawback to Java's way is that longString.substring(0, 2) can keep longString's entire backing array in memory, even though you only ever use two chars of it once longString dies. Commented Jul 31, 2013 at 7:15

1 Answer 1

8

Good grief.

Old joke: The manager wanted to know if programmer A or programmer B was the better programmer, so he staged a contest. They both were to write a program to solve a given complicated problem, and the one who wrote the best program would win.

The two programmers submitted their answers. Programmer A's program ran fastest, and the manager was about to declare him to be the winner when programmer B pointed out that the answer provided by programmer A's program was a bit off.

"But my program is still fastest, I deserve to win", said programmer A.

"If the answer doesn't have to be correct, I can write a program that is 10 times faster than yours", retorted programmer B.

Joe Duffy's second example, where he avoids using string.Split(), is wrong. It won't compile. The variable "substr" is undefined.

I rest my case.

Sign up to request clarification or add additional context in comments.

10 Comments

Nice joke, but I'm not sure what your point is. Joe Duffy's mistake is a simple typo - he meant to say "str" instead of "substr".
@Blorgbeard: That he accidentally broke the code kinda drives home the point, though. The more you complicate things, the more likely it is for stuff like that to creep in. Particularly if you're increasing complexity. Doesn't help his case it was just a typo, either; it's evidence that he didn't bother to test or profile the code before he commenced the ranting about substrings. Actually typing it in VS and compiling it would have revealed the error.
@Blorgbeard: What cHao said - as soon as I saw the mistake in the second example I couldn't even be bothered reading the article, and this joke sprang into my mind. If the answer is wrong it doesn't matter if the program is 10 times or a 1000 times faster. And if the program is unnecessarily complicated then it becomes much more likely that it will be wrong, either now or when it needs to be changed due to maintainance.
@cHao: I see you posted a comment on Joe Duffy's blog article. Strangely, the first comment there, posted seven months ago, points out the error, and Mr. Duffy didn't bother to correct his code or to reply.
@RenniePet ok, the answer is wrong, but it's trivially correctable. Pretend he corrected the typo. The answer would no longer be wrong. Do you have any other objection to his post? I am not really defending Joe Duffy here, I just think the presence of a typo is a strange thing on which to base your dismissal of his idea..
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.