2

I have a list of 400 strings that all end in "_GONOGO" or "_ALLOC". When the application starts up, I need to strip off the "_GONOGO" or "_ALLOC" from every one of these strings.

I tried this: 'string blah = Regex.Replace(string, "(_GONOGO|_ALLOC)", ""));'

but it is MUCH slower than a simple conditional statement like this:

if (string.Contains("_GONOGO"))
          // use Substring
else if (string.Contains("_ALLOC"))
          // use Substring w/different index

I'm new to regular expressions, so I'm hoping that someone has a better solution or I am doing something horribly wrong. It's not a big deal, but it would be nice to turn this 4 line conditional into one simple regex line.

2
  • 1
    Does your regex perform any better if you put a $ anchor on the end of the pattern? Commented Sep 15, 2009 at 0:36
  • You should use EndsWith instead of Contains. Along with being more correct, it's faster. :) Commented Sep 15, 2009 at 0:56

5 Answers 5

8

While it isn't RegEx, you could do

string blah = string.Replace("_GONOGO", "").Replace("_ALLOC", "");

RegEx is great for complex expressions, but the overhead can sometimes be overkill for very simple operations like this.

Sign up to request clarification or add additional context in comments.

1 Comment

Thank you, this is just fine - regex wasn't a requirement I just wanted it down to one line.
4

Regex replacements may work faster if you compile the regex first. As in:

Regex exp = new Regex(
    @"(_GONOGO|_ALLOC)",
    RegexOptions.Compiled);

exp.Replace(string, String.Empty);

2 Comments

Note also (from MSDN) "The Regex class is immutable (read-only) and is inherently thread safe." You can create it once and assign it to a static readonly field. See acorns.com.au/blog/?p=136
And from the Atwood Archives: codinghorror.com/blog/archives/000228.html
3

This is expected; in general, manipulating a string by hand will be faster than using a regular expression. Using a regex involves compiling an expression down to a regex tree, and that takes time.

If you're using this regex in multiple places, you can use the RegexOptions.Compiled flag to reduce the per-match overhead, as David describes in his answer. Other regex experts might have tips for improving the expression. You might consider sticking with the String.Replace, though; it's fast and readable.

Comments

1

If they all end in one of those patterns, it would likely be faster to drop replace altogether and use:

string result = source.Substring(0, source.LastIndexOf('_'));

Comments

1

When you have that much information about your problem domain, you can make things pretty simple:

const int AllocLength = 6;
const int GonogoLength = 7;
string s = ...;
if (s[s.Length - 1] == 'C')
    s = s.Substring(0, s.Length - AllocLength);
else
    s = s.Substring(0, s.Length - GonogoLength);

This is theoretically faster than Abraham's solution, but not as flexible. If the strings have any chance of changing then this one would suffer from maintainability problems that his does not.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.