My observation is that for OpenCL or mixed work, the new apps keep one work unit per GPU. For stream or Nvidia work, the app still spreads the task across GPUs.