A cross aka joint option would simply allow one to supply a second input image to bilateralFilter and have sigmaColor apply to that image when computing the gaussian weights. One noteable usage for this filter is in the well-cited paper:
Zhuo, Shaojie, and Terence Sim. "Defocus map estimation from a single image." Pattern Recognition 44.9 (2011): 1852-1858