PDA

View Full Version : MW WUs not using all of GPU power?



DrPop
11-07-11, 12:18 PM
It's been a while since I crunched any MW. Kat's rig is back online now and she wants to crunch MilkyWay, of course. :o So...I fired it up last night, and realized this morning that the WUs are taking way longer than I remember them taking. She's got the 2GB version of Sapphire's HD4870 clocked at 790MHz...but I can't figure something out. It's only loading the GPU at like 70%. It is crunching Collatz, FreeHal and WuProp on the CPU if that matters. Also, if I suspend MW and let Collatz crunch on the GPU, Collatz will load the GPU to 98 or 99% like it should.

Any ideas of what's going on with MW here and why it won't load up the GPU higher?:confused:

trigggl
11-07-11, 12:59 PM
In Linux with a 4870, most of my tasks are completing in just under 3 minutes. What are you seeing?

DAD
11-07-11, 01:01 PM
Open cl sucks and is inefficient ? ;)

Seriously, I dunno I don't use ati. But it could be the mw open cl code isn't optimized well

shralper
11-07-11, 01:04 PM
I just tried Milky Way on my machine. It seems Milky Way is only using around 80% of my 560 and 570 while prime grid uses 99% of both of them. So something MW is doing...?

DrPop
11-07-11, 02:09 PM
This is weird. Did MW re-code and cave in to OpenCL then? It used to be "native" ATI code, and was very efficient. I could get loads of 98 - 99% all the time back when I crunched on both a 4870 and 5870. Hmmm...I will have to look into this. I wonder if I suspend the CPU work if that will make a difference? I'll try some different scenarios when I have time here between patients and report back.

Al
11-07-11, 03:15 PM
Not seeing any changes on my xp box, my 6950 runs 98 - 99%, even while running Climate Prediction, Boincsimap and numberfields. Nvida cards are another story, mw still isn't optimized for them so they always run at 70 to 75% useage max.

Fire$torm
11-07-11, 04:47 PM
This is weird. Did MW re-code and cave in to OpenCL then? It used to be "native" ATI code, and was very efficient. I could get loads of 98 - 99% all the time back when I crunched on both a 4870 and 5870. Hmmm...I will have to look into this. I wonder if I suspend the CPU work if that will make a difference? I'll try some different scenarios when I have time here between patients and report back.

I just checked MW's app page and the ATI app is still listed as ATI14 which uses Stream code. Take a look at the MW wu's, are they tagged as ATI14?
When I first started crunching MW 2+ years ago, one could run two wu's simultaneously but that was using CCC 10.10. When ATI/AMD started adding openCL to the driver's it fubared the ability to do this, at least for my 4850s. You may wish to try an app_info file like this one (if ATI14)



<app_info>
<app>
<name>milkyway</name>
</app>
<file_info>
<name>milkyway_separation_0.82_windows_x86_64__ati14.exe</name>
<executable/>
</file_info>
<app_version>
<app_name>milkyway</app_name>
<version_num>82</version_num>
<flops>1.0e11</flops>
<avg_ncpus>0.05</avg_ncpus>
<max_ncpus>1</max_ncpus>
<plan_class>ati14ati</plan_class>
<coproc>
<type>ATI</type>
<count>0.5</count>
</coproc>
<cmdline></cmdline>
<file_ref>
<file_name>milkyway_separation_0.82_windows_x86_64__ati14.exe</file_name>
<main_program/>
</file_ref>
</app_version>
</app_info>


You might also have to change the exe reference in the config if it doesn't match the app in the project folder.

You can also try the opti_apps written by Arkayn and Crunch3r (Link (http://www.arkayn.us/forum/index.php?action=downloads;cat=11)) although MW tends to roll the latest opti_app into the server for client DL'ing.

Hope this helps.

F$

DAD
11-07-11, 04:48 PM
I just grabbed some nv gpu MW wu... all of them are opencl.. lame... looks like they caved to it...

I mean it's EASIER for them to code this way so they don't have to code cuda and ati separate, but it makes it very inefficient on either brand of GPU

They are also putting a LOT lighter load on my GPUs vs PG... with PG I have to do everything I can to keep them from overheating... with these MW ones, the cards are staying nice and cool

DAD
11-07-11, 04:53 PM
Those MW GPU apps, at least for NV annoy the hell out of me.. they keep running even after I close boinc down... and refuse to suspend when boinc tells them to.

They also eat up an enormous about of CPU use. ONE GPU WU is taking 25% of my CPU time.. that's 2 full cores... IMO, their GPU WUs blow... I stick to their multi CPU WUs which are actually very nice imo

DrPop
11-07-11, 09:58 PM
Yeah, MW has never been that great for CUDA cards. ATI / AMD cards though - it was always good credits (when it was up). :p

OK, finally got a chance to look at this rig here. Busy day. Thanks for all the replies and suggestions.
I found the problem. Running Collatz, FreeHal, and WuProp on the 6 core CPU was hogging all the CPU and starving the GPU. Strange, since this never happened before, and especially with a "small" GPU like the 4870.
When I suspended Collatz on the CPU, the GPU % immediately jumps up to 97 or 98%. At 790MHz core clock, this is spitting out WUs in 2:23 now. Very nice, much like what I remember. For the record, they are the native ATI14 code WUs.

However, what did they change? So on Kat's rig, I will have to disable one CPU core from BOINC crunching just to feed a lowly 4870 GPU MW WUs?
That is lame. Anyone notice anything similar, and is there perhaps a setting somewhere that I can change to allow the GPU feeding to take precedence over the CPU WUs, or is this a new "feature" in MW that it uses higher % of the CPU now?

DAD
11-08-11, 12:32 AM
That's because the mw gpu apps use too much CPU. They should tell boinc they need 1-2 CPUs so boinc doesn't over use the CPU

DrPop
11-08-11, 01:37 AM
I guess I'm left wondering what changed? Because their native ATI apps, like the one I'm using on the 4870 used to be very efficient and used hardly any CPU at all. I think like 0.05 CPU is what they used to use. I wonder why it's different now, or is it something on my end?

DAD
11-08-11, 02:33 AM
It's still marked 0.05 but it's more like 2.0. I bet it has to do with opencl and something they screwed up

Bryan
11-08-11, 10:24 AM
On my machines I reserve 1 thread for GPUs and the loading for both the Cuda and ATI stays near 98%. That includes the GTX 570 / 6990 box which in effect is running 3 GPUs. I do NOT have a problem with the CPU tasks reducing the GPU loading.

However, I STRONGLY recommend installing ProLasso which allows you to set the priority and affinity for all tasks. As I said I reserve 1 thread in BOINC for GPU. Then using ProLasso I set the priority for the GPU tasks to HIGH and the priority for the CPU tasks to NORMAL. I set the AFFINITY of the GPU programs to 0-7 (no restriction) and set affinity for the CPU tasks to 0-6. That keeps 1 thread free of CPU tasks completely.

One problem that I have found running both Cuda and ATI in the same box and on the same project has to do with BOINC. The scheduler is so stupid it doesn't always ask for the correct type GPU work. The work scheduler works per computer and not per type of GPU. For example, if the project will allow 100 wu to be cached, BOINC may start asking for ATI wu when in reality it needs Cuda wu. Eventually it will fill the entire max wu allowed with one type or the other of GPU tasks. So one GPU will have all the work it can handle and the other will sit idle. I first found this on Collatz.

According to Slicker, it is a problem with the BOINC client. He said what MIGHT work is to make sure your cache size is low enough that you won't hit the maximum number of wu allowed by the project. Then it MIGHT leave enough room that it can get both types of wu. I haven't tried this but I'm not convinced it will work either. I've watched BOINC report a Cuda wu and then request ATI work. It does that over and over and over ...... Running Collatz as a backup project solves the problem - it will request the same type of work unit it is reporting. Of course having to wait to download a wu before you can crunch kills throughput.

DAD
11-08-11, 02:37 PM
Another option is to add swan_sync 0 to windows. That will auto reserve 1 core per gpu Wu. It's been debated on the performance gain. Some ppl claim a lot and some not so much

Bryan
11-08-11, 03:17 PM
I think swan_sync is specific to GPUGrid but I could be mistaken.

DAD
11-08-11, 09:58 PM
I think swan_sync is specific to GPUGrid but I could be mistaken.

I think u may be right. I'll add it and reboot and see if it does anything fow mw

DrPop
11-08-11, 11:42 PM
@Bryan, thanks for the detailed reply. I will look into that. I was just curious about this because we never had to do this before with MW. It must be something new, and I wish they didn't change whatever they changed. It is lame to have to reserve a whole core just to feed a lowly 4870! :p;)

Bryan
11-09-11, 12:00 AM
@Bryan, thanks for the detailed reply. I will look into that. I was just curious about this because we never had to do this before with MW. It must be something new, and I wish they didn't change whatever they changed. It is lame to have to reserve a whole core just to feed a lowly 4870! :p;)

Agreed!!!!

DAD
11-09-11, 12:03 AM
@Bryan, thanks for the detailed reply. I will look into that. I was just curious about this because we never had to do this before with MW. It must be something new, and I wish they didn't change whatever they changed. It is lame to have to reserve a whole core just to feed a lowly 4870! :p;)

If I run 2 mw gpu WUs on my 2 560tis, it takes FOUR CPU cores total. I think I'll email the project lead at mw and politely as why 1 gpu wu needs to take up 25% (2 cores) on my machine.

Swan_sync had no effect on mw. Probably cuz it's doing something similar. It did reserve 1 CPU core per gpu wu for mst other projects though

Slicker
11-09-11, 10:42 AM
I just checked MW's app page and the ATI app is still listed as ATI14 which uses Stream code. Take a look at the MW wu's, are they tagged as ATI14?
When I first started crunching MW 2+ years ago, one could run two wu's simultaneously but that was using CCC 10.10. When ATI/AMD started adding openCL to the driver's it fubared the ability to do this, at least for my 4850s. You may wish to try an app_info file like this one (if ATI14)

The driver never really had anything to do with running two WUs at a time on the same GPU. That was because of the way the app was written. A GPU can only run one function from one app at a time. However, ATI allows a second app to be loading and unloading data into GPU memory at the same time the first one is actually running. While the apps Gipsel (a.k.a. Crystal Physik on MW) wrote were really good, a little more performance could be gained by running them two at a time. I think Gipsel turned over his code to MW and is no longer writing the apps. If that's the case, Travis and company probably have no clue what a mutex or semaphore is or why it is useful to have in the MW ATI app and just got rid of the code -- and the ability to run two WUs at a time.

Also, after Gipsel would compile the GPU code, he would tweak the GPU assembly language because he could get another 15-30% more performance that what the compiler could do. So, if Travis is making changes to the ATI code and Gipsel isn't helping, I would expect both performance to drop and the abililty to crunch 2 at a time to cease.

As far as OpenCL vs CAL, it won't be a choice for very long. ATI has announced they are dropping CAL and that OpenCL will be the ONLY option in the future. That sucks because, even if the app is written to have 99% utilization, it sill runs 30-50% slower on OpenCL -- or at least the Collatz app does. nVidia's OpenCL version isn't any better but since they have a much larger CUDA following, they haven't said anything about dropping CUDA.

DrPop
11-09-11, 11:31 AM
... So, if Travis is making changes to the ATI code and Gipsel isn't helping, I would expect both performance to drop and the abililty to crunch 2 at a time to cease.

As far as OpenCL vs CAL, it won't be a choice for very long. ATI has announced they are dropping CAL and that OpenCL will be the ONLY option in the future. That sucks because, even if the app is written to have 99% utilization, it sill runs 30-50% slower on OpenCL -- or at least the Collatz app does. nVidia's OpenCL version isn't any better but since they have a much larger CUDA following, they haven't said anything about dropping CUDA.

Wow. So that totally explains the drop in production when running MW. I thought I was going crazy, but I used to get at least 20K+ more credits / day on the 4870 there a year and a half ago.

Also on your last note - this is cold water for ATI/AMD GPU crunchers. Does that mean one should not invest in any more AMD GPUs for crunching? I mean, if all projects are forced to go either OpenCL or CUDA, then what is the future for awesome crunching cards like HD 5970? Do we expect a performance hit of up to 50% lower credits? :-o Doesn't AMD realize this is the death knell for their GPUs to ever compete in the high-end computing / server arena?

DAD
11-09-11, 02:39 PM
My guess is ati just wants to concentrate on gaming, but if they stick to pure open cl, Science, etc will flock to nv cuda

I've never been an ati fan, but now I'd never buy one for boinc. It would be a waste IMO as you'd get better bang for ur buck for cuda

I'm sure boinc projects will support ati, but only with crippled open cl

Anything cross platform never does 100%