M'kay i believe i have it figured out. What I believe they did was added a second vector float ALU pipeline that shares the same register space and execution control flow. Yes, you can say they have doubled the core count but you should also consider that they are "less capable" cores. Similar to what AMD pulled off with bulldozer cpus. And got sued over.

Basically, each core can do FP+INT at the same time, just like the 2000 series. But it can also now do FP+FP instead. However, when it does FP+INT at the same time, the total TFLOPS are cut in half. *sigh* always with the shady gimmicks