example_tutorial_how_to_use_parallel_for :- comment
I have run the above compiled tutorial on an Ubuntu x86 laptop and on Odroid XU4 (8 core Arm, 4xA15 + 4xA7). The tutorial code ably demonstrates the compute acceleration by using all cores in parallel against single core running, usually around 4-5x speed up. However on the Odroid after running the parallel section (around 8 secs) the code switches to the single processor sequential mode but the choice of processor is uncontrolled. Sometimes it picks a single A15 (proc 4-7) or A7 (proc 0-3) so the sequential mode can vary from around 40 sec for A15 to 96 secs for A7, rendering speed ups of around 5 (A15) to 11 (A7).
This is merely a comment and not a criticism, but may lead to confusion over speed up for anyone unfamiliar with parallel operation. Htop demonstrates this processor usage quite effectively.