Running 2 at a time on the TITAN, this time without any CPU tasks running:
Task run time = 5,500 seconds. That works out to ~202k/day. So 3 is the best.
In comparison, running with 2 GPU tasks 1 full thread each AND filling up the remaining threads with CPU tasks was 133k/day. That is a SIGNIFICANT hit to production. Now I need to run some more tests to see if I can run some number of CPU tasks before the slow down kicks in.