Now this is odd!
Running 3 at a time on the TITAN:
Task run time = 6,900 seconds. That works out to ~243k/day. So 3 is better than 4. But how is 3 faster per task than 2???. I think I may have run that 2-per set with CPU tasks also running. I think I will try re-sunning 2-per w/o CPU tasks, to get apples to apples comparison.