0

I have a string which is generated by linux uuid generation code (libc):

1b4e28ba-2fa1-11d2-883f-b9a761bde3fb

I need to replace some of the characters in this string:

- with _
2 with f
4 with x

I am generating the 200 UUIDs using a loop.

So for every uuid I need to replace using a custom function, so that function must be maximum optimised to do so, how can I attain that?

10
  • 2
    200? That's not a large number, it would take some real work to not make this almost instant on a modern computer. Also, what do you have so far? Commented Apr 20, 2013 at 3:20
  • 1
    stackoverflow.com/a/779960 Commented Apr 20, 2013 at 3:21
  • 1
    You can attain that by writing a program which does that. It's best to start early. If you have any problems, please post here. Commented Apr 20, 2013 at 3:31
  • 1
    @SchighSchagh , is it a good way "switch on each char, and replace it accordingly" for 200 uuids? Commented Apr 20, 2013 at 3:33
  • 1
    @RobertHarvey OP is replacing individual characters not substrings per se so this question is not a duplicate. Commented Apr 20, 2013 at 3:33

3 Answers 3

4

I suppose you're using char[] str

char *c;
for(c = str; *c != '\0'; ++c){
    if( *c == '-' ) *c = '_';
    else if( *c == '2' ) *c = 'f';
    else if( *c == '4' ) *c = 'x';
}

switch version

char *c;
for(c = str; *c != '\0'; ++c){
    switch(*c){
        case '-': *c = '_'; break;
        case '2': *c = 'f'; break;
        case '4': *c = 'x'; break;
    }
}
Sign up to request clarification or add additional context in comments.

6 Comments

what about replacing with regular expression
@Kajal replacing via regular expression is slower
@Kajal reg ex is overkill. I think writing it with a switch as I said in comment above instead of if/else if would be more readable, but it likely compiles to the same or very similar code.
You may be interested in the result of my testing.
@MattPhillips did you use -O3 in your compiler argument?
|
4

Is something as trivial as this what you want?

void my_replace(char* str)
{
    while (*str) {
        switch (*str) {
        case '-':
            *str = '_';
            break;
        case '2':
            *str = 'f';
            break;
        case '4':
            *str = 'x';
            break;
        default:
            break;
        }
        ++str;
    }
}

It is really fast and simple. I really can't see how you can make it faster.

EDIT: I am aware of some optimizations in certain string operations, but I do not see how they can be applicable here. For example, in the case of memcpy, one may copy 4 or more bytes at a time, depending on the processor. In case of comparing strings that is properly aligned, comparing as integers may be possible and more efficient. I just can't see an applicable technique.

4 Comments

High performance libraries such as LAPACK and BLAS use finely tuned, processor-specific code (often in Assembly) to achieve speeds many times faster than what one can do in a high-level language, even for operations as simple as matrix multiplication. I would expect the same is true of the C standard library as well.
@MattPhillips Your expectation is invalid. Have you tested the optimisations that your compiler implements, lately?
Have you? What makes you certain that the C lib is not optimized?
As an example of what @Matt is saying, glibc employs SSE to optimise memcpy, strnlen, and other functions. In fact, several libc implementations do employ optimisations like this.
1

C library functions may be optimized and significantly faster than a hand-coded iteration.

char* uuid; // = ...
//    size_t uuid_len; // = ... length of uuid


char* ptr = strpbrk(uuid, "-24");
while (ptr)
{
   switch(*ptr)
   {
      case '-':
          *ptr = '_';
          break;
      case '2':
          *ptr = 'f';
          break;
      case '4':
          *ptr = 'x';
          break;
   }
//       if (ptr-uuid == uuid_len) break;

   ptr = strpbrk(ptr+1, "-24");
}

Edit: Took out the range checking, on the basis of the example here that doesn't seem necessary.

Edit: So I decided to test out the 3 algorithms here to see which is faster. I had a loop of 100000 strings, on a vintage 2006 Mac Pro, compiling with gcc, -O3. I took the average of 1000 runs and did 5 cycles.

AND THE WINNER IS...

@johnchen by a hair with an average time of 7.85ms.

@YongweiWu just behind with an average time with 7.89ms. The difference looks significant; going in and doing a proper statistical test is not going to happen tonight unfortunately. :)

...and strpbrk a distant third at 32ms. (Glad I qualified all my optimization claims with 'might', 'may' etc....)

Edit: There is a big difference with Clang--j @ WY's algorithms take 10ms under Clang (looks tied between them), mine is unchanged.

1 Comment

@user93353 No, look at the last line. After a character find strpbrk restarts at the next character, it does not start over at the beginning.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.