cuda ran fine on Moo, but slower than OpenCL. 2080ti should do around 2.2m easy with OpenCL