First, check whether you have enough iteration count, pos.size(). Obviously this should be a sufficient number.
Recursive parallelism is an interesting pattern, but it may not work very well with OpenMP, unless you're using OpenMP 3.0's task, Cilk, or TBB. There are several things that need to be considered:
(1) In order to use a recursive parallelism, you mostly need to explicitly call omp_set_nested(1). AFAIK, most implementations of OpenMP do not recursively spawn parallel for, because it may end up creating thousands physical threads, just exploding your operating system.
Until OpenMP 3.0's task, a OpenMP has a sort of 1-to-1 mapping of logical parallel task to a physical task. So, it won't work well in such recursive parallelism. Try out, but don't be surprised if even thousands threads are created!
(2) If you really want to use recursive parallelism with a traditional OpenMP, you need to implement code that controls the number of active threads:
if (get_total_thread_num() > TOO_MANY_THREADS) {
// Do not use OpenMP
...
} else {
#pragma omp parallel for
...
}
(3) You may consider OpenMP 3.0's task. In your code, there could be huge number of concurrent tasks due to a recursion. To be efficiently working on a parallel machine, there must be an efficient mapping algorithm these logical concurrent tasks to physical threads (or logical processor, core). A raw recursive parallelism in OpenMP will create actual physical threads. OpenMP 3.0's task does not.
You may refer to my previous answer related to a recursive parallelism: C OpenMP parallel quickSort.
(4) Intel's Cilk Plus and TBB support full nested and recursive parallelism. In my small test program, the performance was far better than OpenMP 3.0. But, that was 3 years ago. You should check the latest OpenMP's implementation.
I have not a detailed knowledge of negamax and minimax. But, my gut says that using recursive pattern and a lock are unlikely to give a speedup. A simple Google search gives me: http://supertech.csail.mit.edu/papers/dimacs94.pdf
"But negamax is not a efficient serial search algorithm, and thus, it
makes little sense to parallelize it."
bestat the same time may get the wrong number in.if (val >= best)in your critical section you modifybest, but the test is outside..