Lower cpu usage on searching a big char array

Question

I'm searching for few bytes in a char array. The problem is that on slower machines the process gets up to 90%+ cpu usage. How to prevent that? My code is:

            for(long i = 0; i < size - 5; ) {
                if (buff[++i] == 'f' && buff[++i] == 'i' && buff[++i] == 'l' && buff[++i] == 'e') {
                     printf("found at: %d\n", i);
                }
            }

EDIT: The string "file" is not null-terminated.

Apart from putting sleep() calls in there I don't see many options. — Voo
– Voo, Commented Aug 29, 2011 at 22:37
@Blez I think he was joking - sleep inserts a minimum pause of 1 second, so on a 3k string your program will spend almost 1h sleeping its way thru the string. With, I admit, very low CPU usage. — fvu
– fvu, Commented Aug 29, 2011 at 22:42
@fvu Hey it would solve the problem ;) But I was serious though while a better algorithm (ie STL) is obviously the first thing to do, if you still want to limit the amount of processing done then using sleep or nanosleep (which I think MS doesn't implement, but there's Sleep or just use boost for Xplatform issues) is the only way I see how to solve this - obviously NOT sleeping after every loop iteration but creating two loops: Ie search through the first X characters, sleep Yms, repeat. — Voo
– Voo, Commented Aug 29, 2011 at 23:09
Ok actually using sleep isn't the first or best idea at second glance, but the usually best approach has the big problem that it's basically unportable. But under Windows you can just set the process's priority to something lower than NORMAL (and vista+ allows to reduce IO priority as well). Though I've no idea how this works under POSIX.. — Voo
– Voo, Commented Aug 29, 2011 at 23:36

fvu · Accepted Answer · 2011-08-29 23:27:06Z

4

This looks like an attempt at very naive string search, I'd suggest you use either the standard functions provided for this purpose (like strstr) and/or research string search algorithms like Boyer-Moore.

The linked Wikipedia article on Boyer-Moore shows quite well why moving along one character at a time on a mismatch (like you do) is not necessary - it's an interesting read.

EDIT: also look at this page, it has a nice animated presentation that shows how BM does its job.

EDIT2: regarding the string not being nullterminated: either you

buff[size] = 0;

terminate it yourself, and use strstr, or you have a look at the BM code from the page I linked, that works with lengths, ie it will work with strings without terminating 0.

edited Aug 29, 2011 at 23:27

answered Aug 29, 2011 at 22:38

fvu

33.1k6 gold badges64 silver badges84 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

blez Over a year ago

The string is not null-terminated.

blez Over a year ago

If I find the 'file' I can terminate it myself ofcourse, the problem is finding the 'file'.

fvu Over a year ago

@blez did you look at the linked code? It does not need a terminating 0.

Marcelo Cantos · Accepted Answer · 2011-08-30 01:09:00Z

3

There is nothing wrong with getting 90% utilisation, since the algorithm is CPU-bound. But...

~~Unless you expect the search term to be on a 32-bit word boundary, the code is broken. If the word 'file' begins on the second character of the buffer, you will simply skip over it.~~ (EDIT: Short-circuit eval means the code is correct as it stands. My mistake.)

Don't roll your own code for this; use strstr.

edited Aug 30, 2011 at 1:09

answered Aug 29, 2011 at 22:42

Marcelo Cantos

187k40 gold badges338 silver badges366 bronze badges

10 Comments

blez Over a year ago

The string is not null-terminated.

Marcelo Cantos Over a year ago

That's easily remedied in most cases.

Oliver Charlesworth Over a year ago

I'm surprised that this is CPU-bound; it's going to be approximately 1 load per useful instruction; you can't much less CPU intensive than that!

Michael Burr Over a year ago

If the string is not null-terminated, you can use a combination of standard functions such as memchr() and memcmp(). It'll probably still use CPU (as Marcelo mentioned). But wrap that functionality into a function (that takes the haystack, haystack size and the needle as parameters) and unit test it with a bunch of inputs. Then you'll have something in your toolkit that at least you can be confident works correctly.

fvu Over a year ago

@Marcelo, I'm having doubts about that alignment thing - C uses shortcircuited boolean evals, so the if will stop evaluating on the first mismatched char and due to the prefix increment restart on the first char following the mismatch. Still a naive string search but pretty nifty after all.

|

Ed Heal · Accepted Answer · 2011-08-29 23:54:37Z

0

Try just storing a list of values where 'file' is found and print them out after the loop. It will prevent context switches and will enable the CPU to use the cache better. Also put i in a register.

answered Aug 29, 2011 at 23:54

Ed Heal

60.3k18 gold badges91 silver badges137 bronze badges

Collectives™ on Stack Overflow

Lower cpu usage on searching a big char array

3 Answers 3

3 Comments

10 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

3 Comments

10 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related