The difference in results between the CPU StereoSGBM and the GPU StereoBM_GPU methods does not surprise me. You are mixing up two very different stereo reconstruction approaches. The StereoGM_GPU is (more or the less) equivalent to the StereoBM CPU function, both perform winner takes it all (WTA) stereo matching using sum of absolute differences (SAD) between corresponding left and right image pixel values, such that the disparity with the least cost for each pixel individually is chosen.

StereoSGBM on the other is a so called semi global method, so that in addition to calculating the least cost (local optimal) disparity value for each pixel individually, it enforces smoothness constraints between neighboring pixels so that they take similar disparity values. In practice such global methods tend to lead to better results, which you have just experienced.