0

I'm searching for few bytes in a char array. The problem is that on slower machines the process gets up to 90%+ cpu usage. How to prevent that? My code is:

            for(long i = 0; i < size - 5; ) {
                if (buff[++i] == 'f' && buff[++i] == 'i' && buff[++i] == 'l' && buff[++i] == 'e') {
                     printf("found at: %d\n", i);
                }
            }

EDIT: The string "file" is not null-terminated.

8
  • Apart from putting sleep() calls in there I don't see many options. Commented Aug 29, 2011 at 22:37
  • @Voo: Where to put that Sleep()? Commented Aug 29, 2011 at 22:41
  • @Blez I think he was joking - sleep inserts a minimum pause of 1 second, so on a 3k string your program will spend almost 1h sleeping its way thru the string. With, I admit, very low CPU usage. Commented Aug 29, 2011 at 22:42
  • 1
    @fvu Hey it would solve the problem ;) But I was serious though while a better algorithm (ie STL) is obviously the first thing to do, if you still want to limit the amount of processing done then using sleep or nanosleep (which I think MS doesn't implement, but there's Sleep or just use boost for Xplatform issues) is the only way I see how to solve this - obviously NOT sleeping after every loop iteration but creating two loops: Ie search through the first X characters, sleep Yms, repeat. Commented Aug 29, 2011 at 23:09
  • 1
    Ok actually using sleep isn't the first or best idea at second glance, but the usually best approach has the big problem that it's basically unportable. But under Windows you can just set the process's priority to something lower than NORMAL (and vista+ allows to reduce IO priority as well). Though I've no idea how this works under POSIX.. Commented Aug 29, 2011 at 23:36

3 Answers 3

4

This looks like an attempt at very naive string search, I'd suggest you use either the standard functions provided for this purpose (like strstr) and/or research string search algorithms like Boyer-Moore.

The linked Wikipedia article on Boyer-Moore shows quite well why moving along one character at a time on a mismatch (like you do) is not necessary - it's an interesting read.

EDIT: also look at this page, it has a nice animated presentation that shows how BM does its job.

EDIT2: regarding the string not being nullterminated: either you

buff[size] = 0;

terminate it yourself, and use strstr, or you have a look at the BM code from the page I linked, that works with lengths, ie it will work with strings without terminating 0.

Sign up to request clarification or add additional context in comments.

3 Comments

The string is not null-terminated.
If I find the 'file' I can terminate it myself ofcourse, the problem is finding the 'file'.
@blez did you look at the linked code? It does not need a terminating 0.
3

There is nothing wrong with getting 90% utilisation, since the algorithm is CPU-bound. But...

Unless you expect the search term to be on a 32-bit word boundary, the code is broken. If the word 'file' begins on the second character of the buffer, you will simply skip over it. (EDIT: Short-circuit eval means the code is correct as it stands. My mistake.)

Don't roll your own code for this; use strstr.

10 Comments

The string is not null-terminated.
That's easily remedied in most cases.
I'm surprised that this is CPU-bound; it's going to be approximately 1 load per useful instruction; you can't much less CPU intensive than that!
If the string is not null-terminated, you can use a combination of standard functions such as memchr() and memcmp(). It'll probably still use CPU (as Marcelo mentioned). But wrap that functionality into a function (that takes the haystack, haystack size and the needle as parameters) and unit test it with a bunch of inputs. Then you'll have something in your toolkit that at least you can be confident works correctly.
@Marcelo, I'm having doubts about that alignment thing - C uses shortcircuited boolean evals, so the if will stop evaluating on the first mismatched char and due to the prefix increment restart on the first char following the mismatch. Still a naive string search but pretty nifty after all.
|
0

Try just storing a list of values where 'file' is found and print them out after the loop. It will prevent context switches and will enable the CPU to use the cache better. Also put i in a register.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.