0

I am trying to work with 3D arrays in CUDA (200x200x100).

The moment I change my z dimension (model_num) from 4 to 5, I get a segmentation fault. Why, and how can I fix it?

const int nrcells = 200;
const int nphicells = 200;
const int model_num = 5; //So far, 4 is the maximum model_num that works. At 5 and after, there is a segmentation fault

    __global__ void kernel(float* mgridb) 
{
    const unsigned long long int  i = (blockIdx.y * gridDim.x + blockIdx.x) * blockDim.x + threadIdx.x;

    if(tx >= 0 && tx < nphicells && ty >=0 && ty < nrcells && tz >= 0  && tz < model_num){
        //Do stuff with mgridb[i]
    }
}

int main (void)
{

    unsigned long long int size_matrices = nphicells*nrcells*model_num; 
    unsigned long long int mem_size_matrices = sizeof(float) * size_matrices;

    float *h_mgridb = (float *)malloc(mem_size_matrices);
    float mgridb[nphicells][nrcells][model_num];

    for(int k = 0; k < model_num; k++){
        for(int j = 0; j < nrcells; j++){
            for(int i = 0; i < nphicells; i++){
                mgridb[i][j][k] = 0;
            }
        }
    }
    float *d_mgridb;

    cudaMalloc( (void**)&d_mgridb, mem_size_matrices );
    cudaMemcpy(d_mgridb, h_mgridb, mem_size_matrices, cudaMemcpyHostToDevice);

    int threads = nphicells;
    uint3 blocks = make_uint3(nrcells,model_num,1);
    kernel<<<blocks,threads>>>(d_mgridb);
    cudaMemcpy( h_mgridb, d_mgridb, mem_size_matrices, cudaMemcpyDeviceToHost);
    cudaFree(d_mgridb);
    return 0;
}
1
  • Please pay a little more attention to formatting and content of code you post in questions. The code as you posted it was unnecessarily hard to read and contained unbalanced {}. Commented Jul 9, 2013 at 17:53

1 Answer 1

3

This is getting stored on the stack:

float mgridb[nphicells][nrcells][model_num];

Your stack space is limited. When you exceed the amount you can store on the stack, you are getting a seg fault, either at the point of allocation, or as soon as you try and access it.

Use malloc instead. That allocates heap storage, which has much higher limits.

None of the above has anything to do with CUDA. Furthermore its not unique or specific to "3D" arrays. Any large stack based allocation (e.g. 1D array) is going to have the same trouble.

You may also have to adjust how you access the array, but it's not difficult to handle a flattened array using pointer indexing.

Your code is actually strange looking, because you are creating an appropriately sized array h_mgridb using malloc and then copying that array to the device (into d_mgridb). It's not clear what purpose mgridb serves in your code. h_mgridb and mgridb are not the same.

Sign up to request clarification or add additional context in comments.

1 Comment

This fixes the problem. I changed float mgridb[nphicells][nrcells][model_num] to float mgridb = (float *)malloc(mem_size_matrices). Also, in the initialization "for" loop, I referenced it as a 1D array, so mgridb[i + (jnphicells) + (knphicellsnrcells)] = 0. I can see what you're saying, it does not make much sense to have h_mgridb if I am already allocating mgridb.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.