Parallelling CUDA devices

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Parallelling CUDA devices

gamesforonetn
I have an application that requires processing multiple images in parallel in order to maintain real-time speed.

It is my understanding that I cannot multi-thread OpenCV function calls on the same CUDA device. I have tried a construct like this using OpenMP:

#pragma omp parallel for
for(int i=0; i<numImages; i++){
    for(int j=0; j<numChannels; j++){
        for(int k=0; k<pyramidDepth; k++){
            cv::gpu::multiply(pyramid[i][j][k], weightmap[i][k], pyramid[i][j][k]);
        }
    }
}

Unfortunately, although it compiles and works, it apparently executes the numImages threads serially on a single CUDA device.

However, I should be able to send each OpenMP thread to a separate CUDA device and have them execute in parallel, correct? In order to get multiple CUDA devices, I suppose I need multiple video cards?

Does anyone know if the nVidia GTX 690 dual-chip video card works as two independent CUDA devices in OpenCV 2.4 or later? I did find confirmation it works as two OpenCL devices -- but I've found no confirmation for OpenCV.

Thanks,

Michael McCulloch



Reply | Threaded
Open this post in threaded view
|

Re: Parallelling CUDA devices

gamesforonetn
I found an answer for this via another forum. The key is to make use of the gpu::Stream class if your CUDA device is listed by nVidia as compute capability 2 or higher.

cv::gpu::Stream stream[3];

for(int i=0; i<numImages; i++){
    for(int j=0; j<numChannels; j++){
        for(int k=0; k<pyramidDepth; k++){
            cv::gpu::multiply(pyramid[i][j][k], weightmap[i][k], pyramid[i][j][k], stream[i]);
        }
    }
}

The above code seems to execute the multiply's in parallel (numImages is a max of 3 for my app). There are also Stream methods to aid in uploading/downloading images to and from GPU memory asynchronously as well as methods to check the state of a stream in order to aid in synchronization with other code.

So... it apparently does not necessarily require multiple CUDA devices (i.e. GPU cards) in order to execute OpenCV GPU code in parallel!

---
Michael Mc


--- In [hidden email], I wrote:

>
> I have an application that requires processing multiple images in parallel in order to maintain real-time speed.
>
> It is my understanding that I cannot multi-thread OpenCV function calls on the same CUDA device. I have tried a construct like this using OpenMP:
>
> #pragma omp parallel for
> for(int i=0; i<numImages; i++){
>     for(int j=0; j<numChannels; j++){
> for(int k=0; k<pyramidDepth; k++){
>    cv::gpu::multiply(pyramid[i][j][k], weightmap[i][k], pyramid[i][j][k]);
> }
>     }
> }
>
> Unfortunately, although it compiles and works, it apparently executes the numImages threads serially on a single CUDA device.
>
> However, I should be able to send each OpenMP thread to a separate CUDA device and have them execute in parallel, correct? In order to get multiple CUDA devices, I suppose I need multiple video cards?
>
> Does anyone know if the nVidia GTX 690 dual-chip video card works as two independent CUDA devices in OpenCV 2.4 or later? I did find confirmation it works as two OpenCL devices -- but I've found no confirmation for OpenCV.