Yes, you should run more than one task per GPU. Assuming a powerful GPU like a 7970, I run 2x AP at a time, reserving .5 of a CPU thread each, or 3x of the regular S@H tasks reserving .333 of a CPU thread each. CUDA uses more CPU, and requires reserving more CPU threads. Adding any more tasks at a time doesn't add any more load to the GPU in my experience. Take a look at your GPU load to see what is best for your mode.
Yes, there are optimized apps for linux, both CPU and GPU:
http://lunatics.kwsn.net/index.php?m...wnloads;catd=1