The driver never really had anything to do with running two WUs at a time on the same GPU. That was because of the way the app was written. A GPU can only run one function from one app at a time. However, ATI allows a second app to be loading and unloading data into GPU memory at the same time the first one is actually running. While the apps Gipsel (a.k.a. Crystal Physik on MW) wrote were really good, a little more performance could be gained by running them two at a time. I think Gipsel turned over his code to MW and is no longer writing the apps. If that's the case, Travis and company probably have no clue what a mutex or semaphore is or why it is useful to have in the MW ATI app and just got rid of the code -- and the ability to run two WUs at a time.
Also, after Gipsel would compile the GPU code, he would tweak the GPU assembly language because he could get another 15-30% more performance that what the compiler could do. So, if Travis is making changes to the ATI code and Gipsel isn't helping, I would expect both performance to drop and the abililty to crunch 2 at a time to cease.
As far as OpenCL vs CAL, it won't be a choice for very long. ATI has announced they are dropping CAL and that OpenCL will be the ONLY option in the future. That sucks because, even if the app is written to have 99% utilization, it sill runs 30-50% slower on OpenCL -- or at least the Collatz app does. nVidia's OpenCL version isn't any better but since they have a much larger CUDA following, they haven't said anything about dropping CUDA.