My Question is, why my coide is very slow with an intel I7, and how can I resolve the error ?

easy video process but very slow

Hi guys ! I did a small code to record the moving part of a video. In fact i open a video, I process the opticalflow and I threshold the norm of the vector. When their is enough motion i record the frame in a videowritter. The code do what i want but I find it very slow, like 2min for 2min video. I found this sort of problem on this link but I'm already using ffmpeg with opencv. I don't know if it related but at the begining of the run an error occured (but no crash).

OpenCV: FFMPEG: tag 0x34363258/'X264' is not supported with codec id 28 and format 'mp4 / MP4 (MPEG-4 Part 14)' OpenCV: FFMPEG: fallback to use tag 0x00000021/'!???'

Because code is better than explanation:

int main ( argc, const char** argv)
    VideoCapture cap("../Video/6880.mp4");
    int fourcc = CV_FOURCC('X','2','6','4');
    int fps = cap.get(CV_CAP_PROP_FPS);
    Size S = Size((int) cap.get(CAP_PROP_FRAME_WIDTH),    // Acquire input size
           (int) cap.get(CAP_PROP_FRAME_HEIGHT));
    VideoWriter out("output.mp4",fourcc, fps ,S, true);
    Mat flow;
    UMat  flowUmat, prevgray;
    bool catchframe;
    for (;;)
        bool Is = cap.grab();
        double millis = cap.get(CV_CAP_PROP_POS_MSEC); // capture time in frame
        if (Is == false)
            cout << "Video Capture Fail" << endl;
           Mat img ;
           millis /= 1000;
           // capture frame from video file
           cap.retrieve(img, CV_CAP_OPENNI_BGR_IMAGE);
           resize(img, img, Size(640, 480));
           // save original for later
           cvtColor(img, img, COLOR_BGR2GRAY);
           if (prevgray.empty() == false ) 
                // calculate optical flow
                calcOpticalFlowFarneback(prevgray, img, flowUmat, 0.4, 1, 12, 2, 8, 1.2, 0);
                // copy Umat container to standard Mat
                catchframe = false;
                // By y += 5, x += 5 you can specify the grid
                for (int y = 0; y < original.rows; y += 5) 
                    for (int x = 0; x < original.cols; x += 5)
                                const Point2f flowatxy =<Point2f>(y, x) * 10;
                                double norm = sqrt((flowatxy.x*flowatxy.x + flowatxy.y*flowatxy.y) * (flowatxy.x*flowatxy.x + flowatxy.y*flowatxy.y)  );
                                if (norm > 2000)
                                    catchframe = true;
                if (catchframe)
                    timeRejected += millis;
                    putText(original, format("%4.3f", millis),Point(20,20), FONT_HERSHEY_PLAIN, 1.0, CV_RGB(0,255,0), 2.0);
                    resize(original, original, S);
            else                                 // fill previous image in case prevgray.empty() == true

My Question is, why my coide code is very slow with an intel I7, and how can I resolve the error ?