First of all you need to declare for OpenMP what variables you are using and what protection do they have. Generally speaking your code has default(shared) as you didn't specified otherwise. This makes all variables accessible with same memory location for all threads.
You should use something like this:
#pragma omp parallel default(none) shared(total_steps, pixels_inside)
[...]
#pragma omp task private(j) default(none) shared(total_steps, pixels_inside)
Now, only what is necessary will be used by threads.
Secondly the main problem is that you don't have critical section protection. What this means, that when threads are running they may wish to use shared variable and race condition happens. For example, you have thread A and B with variable x accessible to both (a.k.a. shared memory variable). Now lets say A adds 2 and B adds 3 to the variable. Threads aren't same speed so this may happen, A takes x=0, B takes x=0, A adds 0+2, B adds 0+3, B returns data to memory location x=3, A returns data to memory location x=2. In end x = 2. The same happens with pixels_inside, as thread takes variable, adds 1 and returns it back from where it got it. To overcome this you use measurements to insure critical section protection:
#pragma omp critical
{
//Code with shared memory
pixels_inside++;
}
You didn't needed critical section protection in reduction as variables in recution parameters have this protection.
Now your code should look like this:
#include <iostream>
#include <omp.h>
using namespace std;
int main() {
int total_steps = 10000;
int i,j;
int pixels_inside=0;
omp_set_num_threads(4);
//#pragma omp parallel for reduction (+:pixels_inside) private(i,j)
#pragma omp parallel default(none) shared(total_steps, pixels_inside)
#pragma omp single private(i)
for(i = 0; i < total_steps; i++){
#pragma omp task private(j) default(none) shared(total_steps, pixels_inside)
for(j = 0; j < total_steps; j++){
#pragma omp critical
{
pixels_inside++;
}
}
}
cout<<"Total pixel = "<<pixels_inside<<endl;
return 0;
}
Although I would suggest using reduction as it has better performance and methods to optimize that kind of calculations.