Parallel GPU computing using OpenCV

Question

I have an application that requires processing multiple images in parallel in order to maintain real-time speed.

It is my understanding that I cannot call OpenCV's GPU functions in a multi-threaded fashion on a single CUDA device. I have tried an OpenMP code construct such as the following:

#pragma omp parallel for
for(int i=0; i<numImages; i++){
    for(int j=0; j<numChannels; j++){
        for(int k=0; k<pyramidDepth; k++){
            cv::gpu::multiply(pyramid[i][j][k], weightmap[i][k], pyramid[i][j][k]);
        }
    }
}

This seems to compile and execute correctly, but unfortunately it appears to execute the numImages threads serially on the same CUDA device.

I should be able to execute multiple threads in parallel if I have multiple CUDA devices, correct? In order to get multiple CUDA devices, do I need multiple video cards?

Does anyone know if the nVidia GTX 690 dual-chip card works as two independent CUDA devices with OpenCV 2.4 or later? I found confirmation it can work as such with OpenCL, but no confirmation with regard to OpenCV.

Perhaps the answer is in the source code for OpenCV?

Chris O
– Chris O

2012-06-21 16:51:59 +00:00
Commented Jun 21, 2012 at 16:51 — Chris O
– Chris O, Commented Jun 21, 2012 at 16:51

Ɖiamond ǤeezeƦ · Accepted Answer · 2016-04-15 16:24:06Z

5

Just do the multiply passing whole images to the cv::gpu::multiply() function.

OpenCV and CUDA will handle splitting it and dividing the task in the best way. Generally each computer unit (i.e. core) in a GPU can run multiple threads (typically >=16 in CUDA). This is in addition to having cards that can appear as multiple GPUs or putting multiple linked cards in one machine.

The whole point of cv::gpu is to save you from having to know anything about how the internals work.

edited Apr 15, 2016 at 16:24

Ɖiamond ǤeezeƦ

3,3413 gold badges32 silver badges42 bronze badges

answered Jun 22, 2012 at 14:45

Martin Beckett

96.3k28 gold badges196 silver badges268 bronze badges

Sign up to request clarification or add additional context in comments.

6 Comments

mmccullo Over a year ago

Yes, true. The multiply() function is written to take advantage of CUDA threading within the function itself. However, what I need is more than one multiply() function operating in parallel threads. That does not seem to be possible without multiple gpus. Then you can execute a multiply() function on each in parallel and for different images simultaneously.

Martin Beckett Over a year ago

@mmccullo - yes cv::gpu uses low level cuda threading, you can call it in multiple user threads but each will fully utilize the gpu until the other has finished. If you have a card with cuda2 it will use streams to do this async so your threads don't block

mmccullo Over a year ago

I am using CUDA v4.2. I am not sure what your reference to "cuda2" means exactly. It does not appear to necessarily block my OpenMP threads, but it the execution time of my code above is only slightly better than executing in a single thread. It appears the execution of the multiple threads occurs serially on the single CUDA device -- otherwise the execution time should be much less than the single thread on the same device. My test GPU is a Quadro2000M with 2GB and 192 CUDA cores. The images are 1280x960 RGB.

Martin Beckett Over a year ago

@mmccullo - compute capability >= 2 adds async streams

mmccullo Over a year ago

Ah, in fact my Quadro2000M is compute capability 2.1. I therefore did the following: cv::gpu::Stream stream[3]; for(int i=0; i<numImages; i++){ for(int j=0; j<numChannels; j++){ for(int k=0; k<pyramidDepth; k++){ cv::gpu::multiply(pyramid[i][j][k], weightmap[i][k], pyramid[i][j][k], stream[i]); } } } That appears to execute in parallel on the same CUDA device! Thanks.

|

mmccullo · Accepted Answer · 2012-06-26 00:18:23Z

The answer from Martin worked for me. The key is to make use of the gpu::Stream class if your CUDA device is listed as compute capability 2 or higher. I will restate it here because I could not post the code clip correctly in the comment mini editor.

cv::gpu::Stream stream[3];

for(int i=0; i<numImages; i++){
    for(int j=0; j<numChannels; j++){
        for(int k=0; k<pyramidDepth; k++){
            cv::gpu::multiply(pyramid[i][j][k], weightmap[i][k], pyramid[i][j][k], stream[i]);
        }
    }
}

The above code seems to execute the multiply in parallel (numImages = 3 for my app). There are also Stream methods to aid in uploading/downloading images to and from GPU memory as well as methods to check the state of a stream in order to aid in synchronization with other code.

So... it apparently does not require multiple CUDA devices (i.e. GPU cards) in order to execute OpenCV GPU code in parallel!

Brendan Wood · Accepted Answer · 2012-06-21 19:20:10Z

0

I don't know anything about OpenCV's GPU functions, but if they are completely self-contained (i.e., create GPU context, transfer data to GPU, compute results, transfer results back to CPU), then it's not surprising that these functions appear serialized when using a single GPU.

If you have multiple GPUs, then there should be some way to tell the OpenCV function to target a specific GPU. If you have multiple GPUs and can target them effectively, I then I see no reason why the GPU function calls wouldn't be parallelized. According to the OpenCV wiki, the GPU functions target only a single GPU, but you can manually split up work yourself: http://opencv.willowgarage.com/wiki/OpenCV%20GPU%20FAQ#Can_I_use_two_or_more_GPUs.3F

Dual GPUs like the GTX 690 will appear as two distinct devices with their own memory as far as your GPU program is concerned. See here: http://forums.nvidia.com/index.php?showtopic=231726

Also, if you are going a dual GPU route for compute applications, I would recommend against the GTX 690 because its compute performance is somewhat crippled compared to the GTX 590.

answered Jun 21, 2012 at 19:20

Brendan Wood

6,4903 gold badges32 silver badges28 bronze badges

3 Comments

mmccullo Over a year ago

Interesting comment about the 690 vs. 590 performance. This nVidia page indicates a higher computer capability for the 690. Do you have any specifics on how the 690 is crippled?

alap Over a year ago

"According to the OpenCV wiki, the GPU functions target only a single GPU, but you can manually split up work yourself" sadly the link is no more active. What does it mean manually split it up? You have to set the device Id before every gpu opencv call? Is there any official example supporting the statement.

alap Over a year ago

Also does it mean that in SLI / CrossFire mode one should do the manual switch?

arrayfire · Accepted Answer · 2012-06-22 14:06:49Z

0

The GTX 290 behaves as 2 separate CUDA devices, regardless of which OpenCV version you use. You don't need multiple GPU cards to get multiple GPUs, which you have 2 on one card such as in the GTX 290. But, from the CUDA programming perspective, there is not much difference between using the two GPUs on the 290 and using 2 GPUs on separately connected GPU cards. Many OpenCV users use ArrayFire CUDA library to supplement with more image processing features and the easy multi-GPU scaling. Of course, my disclaimer is that I work on ArrayFire, but I really do think that it will help you in this case.

answered Jun 22, 2012 at 14:06

arrayfire

1,76412 silver badges19 bronze badges

Collectives™ on Stack Overflow

Parallel GPU computing using OpenCV

4 Answers 4

6 Comments

Comments

3 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

6 Comments

Comments

3 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related