PDA

View Full Version : Poem new GPU app update



EmSti
03-18-14, 01:45 PM
From the News Thread a little while again :

"I've just updated the POEM@HOME Binaries to Version 2.00. This is the first big step for our new GPU project. I'll have to adjust some parameters of the new work units, and will send them out at the end of this week.
I'm confident the errors which occurred during the test phase have been resolved. If you encounter any problems nevertheless, please report it in our message boards." - Thomas Koch

myshortpencil
03-18-14, 02:38 PM
Skip the techno-goop. When can I start crunching Poem on my Winders machines? :)

EmSti
03-18-14, 02:56 PM
Skip the techno-goop. When can I start crunching Poem on my Winders machines? :)

After you put your clothes back on.

myshortpencil
03-18-14, 03:56 PM
After you put your clothes back on.

I'll take that as a failed attempt at a joke. :)

DrPop
03-18-14, 04:06 PM
After you put your clothes back on.

Hahahaha!!! :)) oh man, I was rolling on that one. :D

Z-TAC
03-18-14, 04:44 PM
Hahahaha!!! :)) oh man, I was rolling on that one. :D

Pay no attention to these lusty imaginations. If they object to your avatar, they should send you a PM rather than taunt you in public. I haven't seen anyone making fun of zombie's avatars, but maybe that's because he has 2.5 billion credits! No offense zombie. :)>-

EmSti
03-18-14, 04:53 PM
No issues with his avatar, just having some fun. And I have made some fun comments at zombie's before. I think I fell in love with one of his, if I recall correctly.

Fire$torm
03-19-14, 11:11 PM
.... I think I fell in love with one of his, if I recall correctly.

Ditto here

Duke of Buckingham
03-20-14, 05:58 AM
After you put your clothes back on.


I'll take that as a failed attempt at a joke. :)


Hahahaha!!! :)) oh man, I was rolling on that one. :D

Dont you understand? It is Myshortpencil washing machine that it is broken ...
http://www.adpic-images.com/data/picture/detail/Broken_washing_machine_347133.jpg
or maybe he used all his money on computers.

EmSti
03-21-14, 01:07 PM
Update, gpu wus released on POEM.

"I've now started some GPU work units. Unfortunately I can already see some errors.

Please make sure you run the new work units with poemcl 2.00. This should be done automatically; however I can see some failed work units using deprecated versions, so please check your config files if you have edited them manually.

You may also have to check the settings of installed anti virus software and allow poemcl to use your GPU.

If there are other errors, please report them here.


To the latest questions:
The new POEM Version is also running old tasks, but has new force field implementations which are used for the recently started simulations. All of our running projects are concerning Protein Optimization.

We are working on a release for the Mac, but it will still only support CPU jobs."

FourOh
03-21-14, 04:06 PM
Have you returned any work successfully? I have two errors after over two hours on my 7790; I have a 5850 running now. Looks like 90% GPU usage with 1 work unit, so I'm sticking with that for now to get a baseline.

EmSti
03-21-14, 04:15 PM
nope. Just aborted all currently running ones. Stopped all CPU tasks. Removed overclocks. Setting for 1 wus/ gpu all cpus free. Starting a clean run.

myshortpencil
03-21-14, 04:18 PM
Have you returned any work successfully? I have two errors after over two hours on my 7790; I have a 5850 running now. Looks like 90% GPU usage with 1 work unit, so I'm sticking with that for now to get a baseline.

I assume the old app_info files will not work. Are new ones available?

Old app_info example:

<app_info>
<app>
<name>poemcl</name>
<user_friendly_name>POEM++ OpenCL version</user_friendly_name>
</app>
<file_info>
<name>poemcl_1.3_windows_intelx86__opencl_ati_100</name>
<executable/>
</file_info>
<app_version>
<app_name>poemcl</app_name>
<version_num>103</version_num>
<avg_ncpus>0.75</avg_ncpus>
<max_ncpus>0.75</max_ncpus>
<flops>4.0e11</flops>
<plan_class>opencl_ati_100</plan_class>
<file_ref>
<file_name>poemcl_1.3_windows_intelx86__opencl_ati_100</file_name>
<main_program/>
</file_ref>
<coproc>
<type>ATI</type>
<count>0.33</count>
</coproc>
<gpu_ram>320.000000</gpu_ram>
</app_version>
</app_info>

Al
03-21-14, 04:32 PM
I've got 1 wu running on each 7970 in a dual system. It's defaulting to 1 cpu and 1 ATI gpu per wu. 99% usage. Looks like they will run a lot longer than they use to. I'll report when they finish.
Edit: I tried it on a dual nv system also. It no longer loop restarts the task on the 2nd gpu. It does tie up the 2nd gpu and an additional core with zero gpu usage. Ugh! Obviously they didn't care to completely fix that problem.

EmSti
03-21-14, 06:33 PM
app_info file not needed. easier to use app_config.xml (they can even be dynamically loaded).

I am aborting all poem work units and will try again another day. Even basic run with no o.c failed. I wonder if they have a multi gpu issue. I will be out of town for a week anyway no time to play with new toys.

EmSti
03-21-14, 06:35 PM
I've got 1 wu running on each 7970 in a dual system. It's defaulting to 1 cpu and 1 ATI gpu per wu. 99% usage. Looks like they will run a lot longer than they use to. I'll report when they finish.
Edit: I tried it on a dual nv system also. It no longer loop restarts the task on the 2nd gpu. It does tie up the 2nd gpu and an additional core with zero gpu usage. Ugh! Obviously they didn't care to completely fix that problem.

On previous version I got around the problem by using 2 clients. Tell each to ignore one NVidia GPU.

Al
03-21-14, 06:47 PM
Both ati tasks errored out at 7100 seconds. "unknown error" is a bit vague.

zombie67
03-21-14, 07:19 PM
On my TITAN, ~65% load with a single task, 96% with two, 97 with three. Each task reserving a full thread. So I am running them two at a time, unless someone tells me that there is an advantage to running more than two, based on credits.

John P. Myers
03-21-14, 07:27 PM
Looks like they'll pay ~6500 credits if you can get them to validate.

zombie67
03-21-14, 07:33 PM
Both ati tasks errored out at 7100 seconds. "unknown error" is a bit vague.

If the tasks are all erring at the exact same time, then the project has set the max run time to that value. When the tasks hit that value, they self-abort. If that is the case, then only the project can fix it.

myshortpencil
03-21-14, 08:28 PM
app_info file not needed. easier to use app_config.xml (they can even be dynamically loaded).

Thanks. So is there a link to the current app_config.xml file?

Al
03-21-14, 08:31 PM
Well, it's not the project as the next one errored out at 5200 seconds. I just got home and have now aborted all the work I had in my cache. Detached and reattached to the project and downloaded more work. This seems to be more "normal"...1 wu at 65% usage, which is more like I'd expect. We'll see how these go.

myshortpencil
03-21-14, 08:41 PM
Thanks. So is there a link to the current app_config.xml file?

The Duke of Buckingham found it for me. :) It's here: http://www.setiusa.us/showthread.php?6095-Poem-app_config-xml

Thanks Duke ^:)^

zombie67
03-21-14, 08:46 PM
<app_config>
<app>
<name>poemcl</name>
<gpu_versions>
<gpu_usage>.5</gpu_usage>
<cpu_usage>1</cpu_usage>
</gpu_versions>
</app>
</app_config>

Edit the GPU and CPU values as needed.

myshortpencil
03-21-14, 08:58 PM
Thanks, Z :)

zombie67
03-21-14, 09:58 PM
Finished two tasks (at a time) on my TITAN. Took 8400 seconds each. 6500 credits each. That works out to ~133k/day.

For one of my dual 7970 machines, I am also running two tasks per card. With the two tasks, it is pegged at 99% flat. Unlike the TITAN, I need to reserve only .5 CPU thread per task.

myshortpencil
03-21-14, 10:10 PM
Finished two tasks (at a time) on my TITAN. Took 8400 seconds each. 6500 credits each. That works out to ~133k/day.

For one of my dual 7970 machines, I am also running two tasks per card. With the two tasks, it is pegged at 99% flat. Unlike the TITAN, I need to reserve only .5 CPU thread per task.

So that means the credits for Poem GPUs are far less than for the retired version. :(

Shoot, they're less than DiRT and Collatz, now, aren't they?

Al
03-21-14, 10:28 PM
Yes, they are.

I did get 2 to validate on my 7970, running 2 at a time with .5 cpu per task

Completed and validated 6,865.27 2,103.69 6,500.00 POEM++ OpenCL version v2.00 (opencl_ati_100)
Completed and validated 6,846.81 2,088.01 6,500.00 POEM++ OpenCL version v2.00 (opencl_ati_100)

Looks to be about 163k/day on a 7970

zombie67
03-21-14, 10:38 PM
For the old app, I seem to recall that running more tasks at a time did not increase the load, but it did increase production significantly. So for someone with a 580 and an 8 thread CPU, the best thing to do would be to run like 6 tasks at a time, with no CPU tasks. If I recall correctly.

Edit: I'm going to try running a bunch over night on the TITAN and see what it does.

zombie67
03-21-14, 11:25 PM
Yes, they are.

I did get 2 to validate on my 7970, running 2 at a time with .5 cpu per task

Completed and validated 6,865.27 2,103.69 6,500.00 POEM++ OpenCL version v2.00 (opencl_ati_100)
Completed and validated 6,846.81 2,088.01 6,500.00 POEM++ OpenCL version v2.00 (opencl_ati_100)

Looks to be about 163k/day on a 7970

One of the two in the machine is factory OC at 1050. It did a pair in 7700 seconds each, or 146k/day. The other card is slightly OC by me at 975, and did the pair in 8300 seconds, or 135k/day.

zombie67
03-22-14, 11:56 AM
Finished two tasks (at a time) on my TITAN. Took 8400 seconds each. 6500 credits each. That works out to ~133k/day.

Running 6 at a time on the TITAN:

Task run time = 15,500 seconds. That works out to ~218k/day. So 6 is better than 2. I will drop it to see what 4 does.

zombie67
03-22-14, 12:04 PM
One of the two in the machine is factory OC at 1050. It did a pair in 7700 seconds each, or 146k/day. The other card is slightly OC by me at 975, and did the pair in 8300 seconds, or 135k/day.

(machines with dual 7970s)

The stock machine returned nothing but errors running 3.

The other machine ran 3 at a time in 12,100 seconds. That translates to 139k/day. No advantage to more than two at a time on 7970s, it seems.

FourOh
03-22-14, 02:11 PM
Ran a single Poem OpenCL work unit on a 5850, results are not promising for that card: 16,531 seconds for a possible 34,000/day. I'll run one more to confirm, but it doesn't seem like a good use of GPU time. It also appears Poem is suffering from the same issue as before: limited GPU work.

zombie67
03-22-14, 02:51 PM
Try running two at a time, just to be sure. Also, BOINC Project Updater is the tool I used for POEM, to make sure I always had work.

FourOh
03-22-14, 03:00 PM
Try running two at a time, just to be sure. Also, BOINC Project Updater is the tool I used for POEM, to make sure I always had work.

Thanks, I'll look into the Project Updater. I'll do some more testing on Poem next week when I get the 7950 I ordered and get to a good milestone on GPUGrid. I'm also hoping they get whatever is causing errors worked out over the next week or two.

Ran a single successful Poem work unit on the 7790, completed in 9690 seconds for a potential 80k/day+.

myshortpencil
03-22-14, 04:24 PM
I removed and reinstalled Poem in a Win7, x64, AMD A8-3870k, XFX 7790 system with app_config set to 1 CPU for every 0.5 GPU. 1 WU ran for 8600 seconds, reported and errored out. :(

zombie67
03-22-14, 04:48 PM
Finished two tasks (at a time) on my TITAN. Took 8400 seconds each. 6500 credits each. That works out to ~133k/day.

Running 6 at a time on the TITAN:

Task run time = 15,500 seconds. That works out to ~218k/day. So 6 is better than 2. I will drop it to see what 4 does.

Running 4 at a time on the TITAN:

Task run time = 10,500 seconds. That works out to ~214k/day. So 4 is better the same as 6, and no added value running more than 4. I will drop it to see what 3 does.

zombie67
03-22-14, 04:56 PM
I removed and reinstalled Poem in a Win7, x64, AMD A8-3870k, XFX 7790 system with app_config set to 1 CPU for every 0.5 GPU. 1 WU ran for 8600 seconds, reported and errored out. :(

I wonder if maybe this project requires DP? Has anyone gotten a SP-only card to return valid results? All my cards are DP, so I can't tell.

FourOh
03-22-14, 06:40 PM
I removed and reinstalled Poem in a Win7, x64, AMD A8-3870k, XFX 7790 system with app_config set to 1 CPU for every 0.5 GPU. 1 WU ran for 8600 seconds, reported and errored out. :(

Hey I have an A8-3870k/7790 system too! I've errored a few and completed a couple. I'm backing off for now until the project team gets the erroring under control.
http://boinc.fzk.de/poem/results.php?hostid=153998

zombie67
03-22-14, 07:12 PM
Finished two tasks (at a time) on my TITAN. Took 8400 seconds each. 6500 credits each. That works out to ~133k/day.

Running 6 at a time on the TITAN:

Task run time = 15,500 seconds. That works out to ~218k/day. So 6 is better than 2. I will drop it to see what 4 does.

Running 4 at a time on the TITAN:

Task run time = 10,500 seconds. That works out to ~214k/day. So 4 is the same as 6, and no added value running more than 4. I will drop it to see what 3 does.

Now this is odd!

Running 3 at a time on the TITAN:

Task run time = 6,900 seconds. That works out to ~243k/day. So 3 is better than 4. But how is 3 faster per task than 2???. I think I may have run that 2-per set with CPU tasks also running. I think I will try re-sunning 2-per w/o CPU tasks, to get apples to apples comparison.

myshortpencil
03-22-14, 09:08 PM
I removed and reinstalled Poem in a Win7, x64, AMD A8-3870k, XFX 6870 system with app_config set to 1 CPU for every 0.5 GPU. 1 WU ran for 18,000 seconds, completed and validated. 31,200 credits a day compared to over 250,000 a day on the retired project. (The computer was running other CPU projects plus being used for Internet surfing and youtube watching).

FYI, the 6870 is a single precision card and the 7790 is double precision.

zombie67
03-22-14, 09:10 PM
Now this is odd!

Running 3 at a time on the TITAN:

Task run time = 6,900 seconds. That works out to ~243k/day. So 3 is better than 4. But how is 3 faster per task than 2???. I think I may have run that 2-per set with CPU tasks also running. I think I will try re-sunning 2-per w/o CPU tasks, to get apples to apples comparison.

Running 2 at a time on the TITAN, this time without any CPU tasks running:

Task run time = 5,500 seconds. That works out to ~202k/day. So 3 is the best.

In comparison, running with 2 GPU tasks 1 full thread each AND filling up the remaining threads with CPU tasks was 133k/day. That is a SIGNIFICANT hit to production. Now I need to run some more tests to see if I can run some number of CPU tasks before the slow down kicks in.

zombie67
03-22-14, 09:21 PM
Hey I have an A8-3870k/7790 system too! I've errored a few and completed a couple. I'm backing off for now until the project team gets the erroring under control.
http://boinc.fzk.de/poem/results.php?hostid=153998

I think I may know what's going on. Are your tasks being suspended at all? Like maybe they are suspending when you use the keyboard? Or running benchmarks, or rebooting? These tasks seem to very sensitive to that kind of stuff. Mine sometimes error out when just changing the number of GPUs in the app_config.xml. Maybe by not changing anything, or doing anything that causes the tasks to suspend at all, you can get them to complete?

John P. Myers
03-22-14, 09:22 PM
Running 2 at a time on the TITAN, this time without any CPU tasks running:

Task run time = 5,500 seconds. That works out to ~202k/day. So 3 is the best.

In comparison, running with 2 GPU tasks 1 full thread each AND filling up the remaining threads with CPU tasks was 133k/day. That is a SIGNIFICANT hit to production. Now I need to run some more tests to see if I can run some number of CPU tasks before the slow down kicks in.
Just a guess, but i'm gonna say the slowdown begins at the point where the number of cores used by the GPUs, plus the number of cores used by CPU WUs exceeds the number of cores the CPU has, either pushing the GPU tasks to a HT thread, or trying to cram it together on the same core/thread as a CPU WU.

myshortpencil
03-22-14, 09:26 PM
I think I may know what's going on. Are your tasks being suspended at all? Like maybe they are suspending when you use the keyboard? Or running benchmarks, or rebooting? These tasks seem to very sensitive to that kind of stuff. Mine sometimes error out when just changing the number of GPUs in the app_config.xml. Maybe by not changing anything, or doing anything that causes the tasks to suspend at all, you can get them to complete?

My A8-3870k/7790 system is a dedicated cruncher. Crunched straight through without interruption and errored. Two more guesses left. :)

BUT WAIT!!!!! I have Collatz GPU tasks on that machine, too, and Boinc may have switched between POEM and Collatz. I'm running another POEM on the same machine and suspended Collatz. Please stay tuned.

(Leave applications in memory while suspended is enabled).

I also note that my A8-3870k/6870 had Collatz and POEM GPU tasks, and it was being used for surfing, youtubing, Excelling, NotePadding, and Doc-ing, all while POEM was running and the WU completed and validated. It was also running Ibercivis at the same time.

myshortpencil
03-22-14, 11:00 PM
My A8-3870k/7790 WU erred again. No interruption of crunch. It was only 65% through the crunch at 7160 seconds. Using BM 7.2.33. Will try 7.2.42

myshortpencil
03-23-14, 12:07 PM
My A8-3870k/7790 WU erred a third time. No interruption of crunch. Using BM 7.2.42. No success on this card. It has OpenCL but no CAL.

conf
03-23-14, 12:55 PM
tried this one ?

http://developer.amd.com/tools-and-sdks/heterogeneous-computing/amd-accelerated-parallel-processing-app-sdk/

They used those things years ago and I read something recently that it could be useful for the new ones too.

c303a
03-23-14, 12:56 PM
I tried running the poem app on my gtx 570 and everyone errored out. Went back to gpu grid.

myshortpencil
03-23-14, 01:11 PM
tried this one ?

http://developer.amd.com/tools-and-sdks/heterogeneous-computing/amd-accelerated-parallel-processing-app-sdk/

They used those things years ago and I read something recently that it could be useful for the new ones too.

Thanks, conf. Love your new avatar. Maybe Four-Oh will give this a try. I just try things and if they don't work, I move on. :) Lots of projects do work well without fussing.

conf
03-23-14, 03:54 PM
Is not so bad, just install it.
The Sculpture is from my master not by myself, just cleaned it several times. He is dying having Parkinson this days
and got nearly blind. Would honor him this way though it makes me sad.

zombie67
03-24-14, 01:05 AM
So, I have two machines with dual 7970s. One crunches the new POEM stuff great, the other nothing but errors. The successful one is running 14.3 beta drivers. The other was 13.12, I think. Anyway, I did a scrubbed uninstall and then installed 14.3, and it is now crunching POEM successfully. No idea if that was just a coincidence or because of the driver upgrade. But for those with failing AMD tasks, try it.

zombie67
03-24-14, 09:30 AM
So, I have two machines with dual 7970s. One crunches the new POEM stuff great, the other nothing but errors. The successful one is running 14.3 beta drivers. The other was 13.12, I think. Anyway, I did a scrubbed uninstall and then installed 14.3, and it is now crunching POEM successfully. No idea if that was just a coincidence or because of the driver upgrade. But for those with failing AMD tasks, try it.

False alarm. The first few tasks validated. All the rest errored out.

zombie67
03-24-14, 03:10 PM
Yep. This always happens when project launch a new app on Fridays.



Message 9540 (http://boinc.fzk.de/poem/forum_thread.php?id=1034&postid=9540) - Posted: 24 Mar 2014, 18:33:03 UTC
Hi everybody,

I'm afraid the release turned out to be a mess. There are much more problems than we have seen on the test server.

The worst thing was my own fault: I thought turning off the GPU sorter would suffice to get work units out of the queue, but in fact all failed units were sent out again. Sorry for that, which I should have known.
It should now be fixed.

I'll grant credits for jobs which failed after taking CPU time as soon as possible.

We do now know possible causes for the errors, and are fixing them. There will be a second test phase, but I can't give you a schedule yet.

Thomas

Fire$torm
03-25-14, 03:14 PM
Dont you understand? It is Myshortpencil washing machine that it is broken ...
http://www.adpic-images.com/data/picture/detail/Broken_washing_machine_347133.jpg
or maybe he used all his money on computers.

So MSP is one of Maxwell's relatives.......?

EmSti
06-07-14, 02:54 PM
If anyone cares, poem is testing gpu wus again. They have separate test server you need to connect to, I think. Check their site for details.

Bryan
06-07-14, 04:07 PM
If anyone cares, poem is testing gpu wus again. They have separate test server you need to connect to, I think. Check their site for details.

You don't get any credits but the rest of us will supply you with atta boys :D

EmSti
07-08-14, 05:59 PM
Notes on the new GPU app so far:
Project updater needed to get wus flowing
Only running on the R9 295x2, increased from 1 wu per r9 290x gpu to 2 wus per gpu and it took no more time. Double the credit same amount of time. Keeping about 4 cpu threads free for the 4 wus. Completing 4 wus every 2.25 hours (11,555 credits an hour, on collatz the card produces ~130,400/hr).
Usage is still less than 100% on the GPUs, I haven't tried 3 or more wus.
Power usage and heat are a lot less than collatz (not hard to do).

Maxwell
07-08-14, 08:01 PM
Notes on the new GPU app so far:
Project updater needed to get wus flowing
Only running on the R9 295x2, increased from 1 wu per r9 290x gpu to 2 wus per gpu and it took no more time. Double the credit same amount of time. Keeping about 4 cpu threads free for the 4 wus. Completing 4 wus every 2.25 hours (11,555 credits an hour, on collatz the card produces ~130,400/hr).
Usage is still less than 100% on the GPUs, I haven't tried 3 or more wus.
Power usage and heat are a lot less than collatz (not hard to do).
Thanks for the heads-up on this. I need to get back to Poem, and this'll be useful...

FourOh
07-08-14, 09:32 PM
Notes on the new GPU app so far:
Project updater needed to get wus flowing
Only running on the R9 295x2, increased from 1 wu per r9 290x gpu to 2 wus per gpu and it took no more time. Double the credit same amount of time. Keeping about 4 cpu threads free for the 4 wus. Completing 4 wus every 2.25 hours (11,555 credits an hour, on collatz the card produces ~130,400/hr).
Usage is still less than 100% on the GPUs, I haven't tried 3 or more wus.
Power usage and heat are a lot less than collatz (not hard to do).


Thanks for the heads-up on this. I need to get back to Poem, and this'll be useful...

We have another thread on the new app with run times, etc here:
http://www.setiusa.us/showthread.php?7154-POEM-GPU-Project-Status

EmSti
07-08-14, 11:09 PM
Thanks for the tip

Continuing my notes on the other thread.