Questions regarding OpenCL implementation / Sync points and threading in OpenCV 3.x
My application is currently making use of the new transparent API UMat facilities in OpenCV 3.x.
My application is also highly concurrent thanks to leveraging libdispatch queues and blocks (C Lambdas effectively running on different underlying threads).
Ive read documentation that OpenCVs OpenCL implementation's setUseOpenCL is thread local - meaning each thread needs to be manually set to use (or not use) an OpenCL device. Is there a way to programmatically within the API runtime set a default for all threads in for future invocations? This is also a touch more subtle with lib dispatch, as my understanding is that depending circumstance, one queue may run blocks/lambdas on multiple different threads either concurrently or serially depending on request, availability and load bearing facilities - and you don't have guarantees of one serial queue being backed by one specific thread, just serial execution of your blocks. So one initial call to a serial dispatch queue may setUseOpenCL on thread a, but subsequent calls to the same dispatch queue may hit thread a, b, or c.
Id' like to know if there is a way to run setUseOpenCL and configure CL devices in the OpenCV runtime to enable or disable globally for all possible threads. Is this possible? Currently, my solution is to continuously enable openCL on blocks that use OpenCV on concurrent or serial queues (without re-configuing the context or device).
Secondly, and more subtly, I understand there are very necessary sync points when ever one has to call someUMat.getMat. However, some TAPI implicit calls seem to sync as well.
• A list of OpenCV's TAPI accelerated calls that when chained with UMAT operations are guaranteed to run on the configured OpenCL device without read back to the CPU until a user requests via .getMat?
• Is there a way to 'double buffer' OpenCV's CL calls, so that CPU wait's on GPU read back isn't synchronous, so for applications with high bandwidth streaming requirements can not spin waiting for CL to flush? Or is there particular design patterns that effectively implement this by requesting to disable OpenCV's implicit flush on finish for TAPI / OpenCL calls and doing so manually via getMat or some other mechanism which calls clFinish?
• Does a UMAT which has been submitted to a clQueue on one thread remain on the GPU if accessed or used on different thread, or does that involve read back and re-submission to the CL device?
• And finally: concurrent read access of a UMAT marked cv::ACCESS_READ from multiple threads: is this safe? This might be a more specific to OpenCL drivers, but I've had working cv::Mat read only access from multiple simultaneous threads sans issue. Conversion to TAPI required me to serialize some calls that were dependent only on a specific input mat that was effectively read only and had no intercommunication at all. I'd love to return to that model with OpenCL.
Thank you in advance for making it through all my questions, any information is highly appreciated!