RSS
04-07-19, 12:00 PM
So there have been some new developments over the last week.  It's both good and bad.
First of all, some history. The reason I waited so long to develop a GPU app is because the calculation was heavily dependent on multi-precision libraries (gmp) and number theoretic libraries (pari/gp). Both of these use dynamically allocated memory which is a big no-no in GPUs. I found a multi-precision library online that I could use by hard coding the precision to the maximum required (about 750 bits), thereby removing the dependence on memory allocations. The next piece of the puzzle was to code up a polynomial discriminant function. After doing this, I could finally compile a kernel for the GPU. That is the history for the current GPU app. It is about 20 to 30 times faster than the current cpu version (depends on WU and cpu/gpu speeds).
But then I got thinking... my GPU polynomial discriminant algorithm is different from the one in the PARI library (theirs works for any degree and mine is specialized to degree 10). So to do a true apples-to-apples comparison, I replaced the PARI algorithm with mine in the cpu version of the code. I was shocked by what I found... the cpu version was now about 10x faster than it used to be. I never thought I was capable of writing an algorithm that would be 10x faster than a well established library function. WTF? Now I'm kicking myself in the butt for not having done this sooner!
This brings mixed emotions. On one side, it is great that I now have a cpu version that is 10x faster. But it also means that my GPU code is total crap. With all the horsepower in a present day GPU I would expect it to be at least 10x faster than the equivalent cpu version. Compared with the new cpu version, the gpu is only 2 to 3 times faster. That is unacceptable.
So the new plan is as follows:
1. Deploy new cpu executables. Since it's 10x faster, I will need to drop the credit by a factor of 10. (Credits/hour will remain the same for the cpu but will obviously drop for the GPU)
2. Develop new and improved GPU kernels.
I don't blame the GPU users for jumping ship at this point. Frankly, the inefficiency of the current GPU app just makes it not worth it (for them or the project).
For what it's worth, I did have openCL versions built. Nvidia version works perfectly. The AMD version is buggy for some reason, as is the windows version. Since I will be changing the kernels anyways, there is no point in debugging them yet.
More... (https://numberfields.asu.edu/NumberFields/forum_thread.php?id=366)
First of all, some history. The reason I waited so long to develop a GPU app is because the calculation was heavily dependent on multi-precision libraries (gmp) and number theoretic libraries (pari/gp). Both of these use dynamically allocated memory which is a big no-no in GPUs. I found a multi-precision library online that I could use by hard coding the precision to the maximum required (about 750 bits), thereby removing the dependence on memory allocations. The next piece of the puzzle was to code up a polynomial discriminant function. After doing this, I could finally compile a kernel for the GPU. That is the history for the current GPU app. It is about 20 to 30 times faster than the current cpu version (depends on WU and cpu/gpu speeds).
But then I got thinking... my GPU polynomial discriminant algorithm is different from the one in the PARI library (theirs works for any degree and mine is specialized to degree 10). So to do a true apples-to-apples comparison, I replaced the PARI algorithm with mine in the cpu version of the code. I was shocked by what I found... the cpu version was now about 10x faster than it used to be. I never thought I was capable of writing an algorithm that would be 10x faster than a well established library function. WTF? Now I'm kicking myself in the butt for not having done this sooner!
This brings mixed emotions. On one side, it is great that I now have a cpu version that is 10x faster. But it also means that my GPU code is total crap. With all the horsepower in a present day GPU I would expect it to be at least 10x faster than the equivalent cpu version. Compared with the new cpu version, the gpu is only 2 to 3 times faster. That is unacceptable.
So the new plan is as follows:
1. Deploy new cpu executables. Since it's 10x faster, I will need to drop the credit by a factor of 10. (Credits/hour will remain the same for the cpu but will obviously drop for the GPU)
2. Develop new and improved GPU kernels.
I don't blame the GPU users for jumping ship at this point. Frankly, the inefficiency of the current GPU app just makes it not worth it (for them or the project).
For what it's worth, I did have openCL versions built. Nvidia version works perfectly. The AMD version is buggy for some reason, as is the windows version. Since I will be changing the kernels anyways, there is no point in debugging them yet.
More... (https://numberfields.asu.edu/NumberFields/forum_thread.php?id=366)