In the code below, I have a parallel section where each thread uses a private vector<int> to push and pop integers. The problem I have is that as I increase the number of threads, the performance of each core decreases drastically, and the Kernel CPU usage (red bar in the htop command) assigned to each core increases a lot. For example, with 1 core I have 100% normal CPU usage, but with 25 cores (see image) almost all the CPU usage goes to the kernel.
It is probably something very basic, but I just don't know why it happens. I would expect that since each thread has its own private variable each core would work exactly the same no matter the number of total CPUS used in the parallel section.
Any advise?
int cpus = 25;
#pragma omp parallel for schedule(dynamic,1)
for (int ss = 0; ss < cpus; ss++)
{
std::vector<int> q;
while (true)
{
q.push_back(rand());
q.pop_back();
}
}
<random>is probably better.randcan either be not thread-safe, which makes your program undefined, or thread-safe, which makes your program non-parallel. (I suspect that the program spends all that kernel time waiting for it.)rand()is not thread-safe.