Some operations support a "mask matrix", which will selectively write-back the result to the destination according to the mask value. Not every element-wise matrix operation support this option. Is this what you are looking for? Search for "mask array"
In addition, in OpenCV one can create a rectangular "sub-matrix" out of a parent matrix. The pixels will map to the same underlying memory addresses. Any pixel data read from or written to the sub-matrix, will be reflected or become visible to the parent matrix. Search for "region of interest" (ROI). Below, I will explain why this rectangular ROI is much more useful.
As for speeding up, note that OpenCV uses SIMD (short vector processing) on some CPU architectures such as x86 (SSE2) and possibly ARM (NEON).
With SIMD processing, there will be no reduction of computation efforts from applying per-pixel masking. However, rectangular region-of-interest will still be able to reduce computation effort with SIMD.
If you approach this question from a vendor-selection perspective, I strongly encourage you to write and optimize some benchmarks for each library, and compare their performance in a scientific and unbiased way. My suggestion is that a side-by-side library feature comparison might not be useful.