PDA

View Full Version : New BOINC feature to easily run multiple GPU tasks



zombie67
12-12-12, 08:52 PM
There is now such a thing as app_config.xml, starting with 7.0.42 (actually .40, but anything before .42 is broken). I think this could be very good! Now we can more easily run multiple GPU tasks, without the complexity of a full-blown app_info.xml! I am going to post about this in a separate thread, so that people don't overlook this, thinking it is something unique to WCG.

The link for this new thing is here:

http://boinc.berkeley.edu/trac/wiki/ClientAppConfig

Currently it says this:


A volunteer can configure a project's apps by putting a file app_config.xml in the project's directory. This file has the following form:


<app_config>
<app>
<name>name</name>
<max_concurrent>N</max_concurrent>
<gpu_versions>
<gpu_usage>.25</gpu_usage>
<cpu_usage>.1</cpu_usage>
</gpu_versions>
</app>
</app_config>

<name>: the short name of an application.

<max_concurrent>: a limit on the number of concurrent jobs for that application.

<gpu_versions>: the <gpu_usage> and <cpu_usage> elements specify the number of device instances used by GPU app versions. Note: there is no provision for specifying this per GPU type or per device.

Note2: The placement of the app_config.xml is in its respective project folder, e.g. for WorldCommunityGrid it goes into .\projects\www.worldcommunitygrid.org

nanoprobe
12-15-12, 07:25 PM
I've converted 3 of my machines over to 7.0.42 with the new a_config file. Works great and it now lets me run multiple GPU tasks on my lone surviving XP 32 bit machine. Couldn't get that to work with an app_info. Not even on Win7 32 bit. It's so much easier to configure and you can move it out of and back into the project folder without losing all the tasks in your cache. The real bonus I like is that you can run individual CPU cores on separate projects if you'd like.
This is the config file I'm using on my XP box to run 2 GPU tasks and 2 SN2S tasks on the CPUs.
I modified the GPU part of it by removing the max_concurrent line to see if the gpu_usage line would control the amount of tasks running and it worked. Leaving the max_concurrent line in place also worked but I like to experiment. *-:)

<app_config>
<app>
<name>hcc1</name>
<gpu_versions>
<gpu_usage>.50</gpu_usage>
<cpu_usage>1.0</cpu_usage>
</gpu_versions>
</app>
<app>
<name>sn2n</name>
<max_concurrent>2</max_concurrent>
</app>
</app_config>

Fire$torm
12-17-12, 07:12 PM
Looks like I'll have to give this a try. Thanks guys for the info. **==

kmanley57
12-17-12, 09:42 PM
Upgrading to 7.0.42 got my Sabayon box to see and use the Graphics card in it. Now I can start working on getting it to crunch GPU stuff!!! =P~

Implemented this on all three of my Windows machines and running Poem now. :cool:

Al
12-18-12, 09:56 PM
I'm using 7.0.42 and the app_config on the MW challenge right now. I'm running mixed systems and doing MW on both nvidia and ati. I have Poem (with it's old app_info) set as backup. I recently found one of the boxes had only one nvidia MW wu left. Interesting that it was running Poem on .5 gpu and mw on .5 gpu, on the same nvidia gpu. I've never seen one gpu run 2 different projects at the same time. Both finished and validated BTW. Has anyone else seen this?

Also, maybe I read it wrong, but I thought app_info wasn't recognized by 7.0.42...obviously it is according to the event manager.

12/18/2012 3:12:41 PM | Poem@Home | Found app_info.xml; using anonymous platform
12/18/2012 3:12:41 PM | Moo! Wrapper | Found app_info.xml; using anonymous platform
12/18/2012 3:12:41 PM | SETI@home | Found app_info.xml; using anonymous platform
12/18/2012 3:12:41 PM | Milkyway@Home | Found app_config.xml

somanyroads
12-19-12, 01:03 PM
This thread is about GPU tasks, but 7.0.42 and app_config.xml certainly help control the number of running cpu tasks also. However, there seems to be no change in the ability of 7.0.42 to get work; still have to manually suspend a cpu task or two to download cpu work on another task even though a core is sitting idle. Even Project Updater won't get work for the cpu task that has no work until some other cpu task is suspended. If left alone, will 7.0.42 "learn" to get work for all running tasks?

Slicker
12-19-12, 03:39 PM
This thread is about GPU tasks, but 7.0.42 and app_config.xml certainly help control the number of running cpu tasks also. However, there seems to be no change in the ability of 7.0.42 to get work; still have to manually suspend a cpu task or two to download cpu work on another task even though a core is sitting idle. Even Project Updater won't get work for the cpu task that has no work until some other cpu task is suspended. If left alone, will 7.0.42 "learn" to get work for all running tasks?

Maybe. Maybe not. I've had projects sit idle for days even with all other projects suspended. When they got rid of short and long term debts and replaced them with fancier algorithms, BOINC got worse -- as usual. When it comes to BOINC features that get added, it is often the tail wagging the dog. One project wants a change and all others suffer because of it. So, rather than require the project admins to use realistic estimates on the flops for a WU, they decided to make the BOINC client smarter which, as is often the case, created a slew of problems that will be around for a number of versions. 7.0.33 seems to get work OK but 7.0.38 and later have a variety of problems. About the time the get to X.X.50 or higher, they get the kinks worked out. Then they start on a new major release and it starts all over again.

kmanley57
12-21-12, 06:03 PM
Came in handy today! Put a second GPU(1-560ti and 1-660ti) in one of my machines and it quit running Moo Wrapper until I put a app_config in for it. Now back to currently running a Wu on both cards. :confused:

nanoprobe
12-23-12, 05:58 PM
Also, maybe I read it wrong, but I thought app_info wasn't recognized by 7.0.42...obviously it is according to the event manager.

12/18/2012 3:12:41 PM | Poem@Home | Found app_info.xml; using anonymous platform
12/18/2012 3:12:41 PM | Moo! Wrapper | Found app_info.xml; using anonymous platform
12/18/2012 3:12:41 PM | SETI@home | Found app_info.xml; using anonymous platform
12/18/2012 3:12:41 PM | Milkyway@Home | Found app_config.xml

I thought WCG was the only project so far that will not let you run on an anonymous platform with an app_info file on 7.0.42. Is MW@Home now doing the same thing?
I get the same Found app_info.xml; using anonymous platform in the event viewer for other projects including MW@Home.

Al
12-23-12, 06:15 PM
I thought WCG was the only project so far that will not let you run on an anonymous platform with an app_info file on 7.0.42. Is MW@Home now doing the same thing?
I get the same Found app_info.xml; using anonymous platform in the event viewer for other projects including MW@Home.

No, I chose to use app_config on mw so I could easily run multiple wus on both nv and ati in the same box. Worked great. I thought the .42 ver didn't recognize app_info anymore. My mistake.

somanyroads
01-01-13, 11:52 PM
Maybe. Maybe not. I've had projects sit idle for days even with all other projects suspended. When they got rid of short and long term debts and replaced them with fancier algorithms, BOINC got worse -- as usual.

Does anyone know if it is the intent of BOINC to allow "<max_concurrent>N</max_concurrent>" in app_config.xml to override the default settings to fetch work? A project like Neurona that runs one task at a time and is 1-out-1-in doesn’t stand a chance of getting tasks especially if the other cpu tasks keep changing and their caches are always full. Moving to Numberfields@home for a short challenge seems to have upset BOINC and it cannot remember what it was doing the day before.

Rattledagger
01-02-13, 09:02 AM
Does anyone know if it is the intent of BOINC to allow "<max_concurrent>N</max_concurrent>" in app_config.xml to override the default settings to fetch work? A project like Neurona that runs one task at a time and is 1-out-1-in doesn’t stand a chance of getting tasks especially if the other cpu tasks keep changing and their caches are always full. Moving to Numberfields@home for a short challenge seems to have upset BOINC and it cannot remember what it was doing the day before.
Where's some for-the-moment unknown plans to extend the functionality further later on, but atleast for now where's no connection between <max_concurrent> and cache-size.

While not optimal, setting all projects except Neurona to zero resource-share should keep one running Neurona-task except for the short periods of uploading, asking for more Neurona-work and downloading next task.

somanyroads
01-03-13, 10:31 AM
Neurona has been at 10000 resource share, all others at 0 for a couple of days; no work for Neurona. Had 5 days backlog for Numberfields so set to no new tasks. A day later have 7-10 days cache. Guess some features of 7.0.42 are turned off or are not functioning.

Uninstalled 7.0.42, deleted 2 client_state.xml files, deleted app_config.xml files, installed 7.0.28, Nuerona at 10000, all others at 0 resource share, preferences set to 0 days and 0 days of work. Worked fine for a few hours, now it’s back to no work for Neurona.

Edit: 2 days later everything seems to be working as I would like it to (using those last settings), 1-out-1-in across the board.

zombie67
01-17-13, 06:39 PM
I thought I read somewhere on the mail lists that the BOINC client would be updated, so that a quit/start cycle would not be required to see changes to the app_config.xml file. Like Advanced -> Read config file menu, for the cc_config.xml. But I couldn't find where I read that.

So I asked Rom, who forwarded me to DA. His answer is, "Probably not in the near future."

Just FYI.

Draconian
01-19-13, 04:42 AM
Neurona has been at 10000 resource share, all others at 0 for a couple of days; no work for Neurona. Had 5 days backlog for Numberfields so set to no new tasks. A day later have 7-10 days cache. Guess some features of 7.0.42 are turned off or are not functioning.

Uninstalled 7.0.42, deleted 2 client_state.xml files, deleted app_config.xml files, installed 7.0.28, Nuerona at 10000, all others at 0 resource share, preferences set to 0 days and 0 days of work. Worked fine for a few hours, now it’s back to no work for Neurona.

Edit: 2 days later everything seems to be working as I would like it to (using those last settings), 1-out-1-in across the board.

I tried working with 7.0.42 - and the bugs made me revert, like yourself. They have some good ideas with the new client - but some stuff to hammer out before it is ready for use.