Just started GPU's on Einstein [Archive]

rgathright

07-11-11, 04:55 PM

Does anyone have some advice for proper setup of the NVIDIA video cards in my machine on Einstein@home?

Fire$torm

07-11-11, 06:42 PM

I have not run the Einstein Cuda app but......

I do not believe the Cuda app is multi-threaded so you should see one wu for each GPU crunching it. For OverClocking your nVidia start at stock clocks and run a wu while checking GPU temp. When the wu is finished pause GPU work and up the GPU clock 20~25 Mhz and crunch another wu, again while monitoring GPU temp. Keep doing this until you reach the highest temp you are comfortable with or until you hit that cards OC limit. Whcih ever comes first. Then you are set to crunch to your hearts content.

:-bd

Edit: I would suggest leaving the nVidia Mem Clocks at stock. Uping mem clock just adds a lot of heat without a significant increase in crunching performance.

Bryan

07-11-11, 11:15 PM

Check the forum for an app_info. Several of the top computers show as "anonymous platform" and it appears they are running 3 wu at a time. A GTX 570 appears to get about 32k per day. This is a pure guess on my part!

Fire$torm

07-11-11, 11:35 PM

Check the forum for an app_info. Several of the top computers show as "anonymous platform" and it appears they are running 3 wu at a time. A GTX 570 appears to get about 32k per day. This is a pure guess on my part!

Yeah, what he said....

rgathright

07-12-11, 08:43 AM

I have yet to reach full utilization of my GPU's in the computer.

Over at PrimeGrid, I was routinely hitting 192F on my 3 GPU's, now I am lucky to hit 160F at Einstein.

Should I tweak my Boinc settings to dedicate more cpu resources to the management of these Einstein CUDA WU's?

As always, THANK YOU for you support!

Bryan

07-12-11, 03:20 PM

Before you mess with your BOINC settings do this:

1. Use GPU-Z to monitor your GPU loading.
2. Go into BOINC Projects and suspend all CPU projects.
3. If the loading jumps up then the problem is with your CPU usage and if it doesn't then the problem is with the Einstein program.

I think this is why folks are running an app_info (anonymous platform) and crunching multiple wu at a time per card.

rgathright

07-12-11, 05:02 PM

I have very bad luck with app_info...

I move my CPU's between sub-projects and projects often.

Would changing my App_info affect this?

Can you give me a link that will explain what exactly I need to modify in the XML file?

THANKS!:D

Bryan

07-12-11, 07:04 PM

The app_info only affects the individual project. It is installed in the Boinc/projects/einstein directory.

Do a search on the Einstein forum for app_info and if you find one then copy it to "Wordpad" and then save it to the Einstein project folder as a text file BUT put the extension .XML. So it will be app_info.xml ..... You can try Notepad which is preferable but sometimes when copying from a forum page it screws up the formatting.

Usually in that file you will find one of the commands that says something like <count>.5</count>
That tells BOINC to have the GPU crunch 2 wu at a time.

Once that is in the project folder then STOP and RESTART BOINC. You will lose all the existing wu in your cache but it should download new ones and then start crunching 2 or if you use .33 instead of .5 it will start working on 3 of them.

Fire$torm

07-12-11, 08:22 PM

I am working on one right now. So far it keeps crapping out. Debugging is a PITA.

I'll post results soon.

Bryan

07-12-11, 09:40 PM

F$, I did a search on their forums for app_info and a couple actually showed up in the thread listings. I didn't look at them but it may give you a starting point.

zombie67

07-12-11, 09:53 PM

If I remember correctly, the GPU app still uses a LOT of CPU. And the GPU us under utilized. You will probably be better off crunching with just CPU, and using your GPU elsewhere more productive.

Fire$torm

07-12-11, 10:53 PM

F$, I did a search on their forums for app_info and a couple actually showed up in the thread listings. I didn't look at them but it may give you a starting point.

Yeah I did see those. I tried them and they failed. So I have been trying to debug them since. Total crash and burn. :mad: :mad: :mad: Now I cannot connect to the server. I think the server is blocking me because it looks like DOS attack....... Sometimes this hobby just..... sucks.

Fire$torm

07-12-11, 11:06 PM

OK, this was my original app_info file based on the BOINC wiki for anonymous paltform

<app_info>
<app_version>
<app_name>einsteinbinary_BRP3</app_name>
<version_num>107</version_num>
<api_version>6.13.0</api_version>
<file_ref>
<file_name>einsteinbinary_BRP3_1.07_windows_intelx86__BRP3cud a32.exe</file_name>
<main_program/>
</file_ref>
<file_ref>
<file_name>cudart_xp32_32_16.dll</file_name>
<open_name>cudart32_32_16.dll</open_name>
<copy_file/>
</file_ref>
<file_ref>
<file_name>cufft_xp32_32_16.dll</file_name>
<open_name>cufft32_32_16.dll</open_name>
<copy_file/>
</file_ref>
<file_ref>
<file_name>einsteinbinary_BRP3_1.00_graphics_windows_intelx86 .exe</file_name>
<open_name>graphics_app</open_name>
</file_ref>
<file_ref>
<file_name>EULA.txt</file_name>
<open_name>EULA.txt</open_name>
</file_ref>
<file_ref>
<file_name>db.dev.win.3d35195e</file_name>
<open_name>db.dev</open_name>
<copy_file/>
</file_ref>
<file_ref>
<file_name>dbhs.dev.win.3d35195e</file_name>
<open_name>dbhs.dev</open_name>
<copy_file/>
</file_ref>
<platform>windows_intelx86</platform>
<plan_class>BRP3cuda32</plan_class>
<avg_ncpus>0.200000</avg_ncpus>
<max_ncpus>1.000000</max_ncpus>
<flops>24320145631.746910</flops>
<coproc>
<type>CUDA</type>
<count>0.500000</count>
</coproc>
<gpu_ram>220200960.000000</gpu_ram>
</app_version>
</app_info>

I do not remember the error messages but it didn't work.

Now here is the app_info file from the Einstein forum edited for the latest CUDA app version 1.07

<app_info>
<app>
<name>einstein_S5GC1HF</name>
<user_friendly_name>Global Correlations S5 HF search #1</user_friendly_name>
</app>
<app>
<name>einsteinbinary_BRP3</name>
<user_friendly_name>Binary Radio Pulsar Search</user_friendly_name>
</app>
<file_info>
<name>einstein_S5GC1HF_3.06_windows_intelx86__S5GCESSE2. exe</name>
<main_program/>
</file_info>
<file_info>
<name>einstein_S5R6_3.01_graphics_windows_intelx86.exe</name>
<executable/>
</file_info>
<file_info>
<name>einsteinbinary_BRP3_1.07_windows_intelx86__BRP3cud a32.exe</name>
<executable/>
</file_info>
<file_info>
<name>einsteinbinary_BRP3_1.00_graphics_windows_intelx86 .exe</name>
<executable/>
</file_info>
<file_info>
<name>cudart_xp32_32_16.dll</name>
<executable/>
</file_info>
<file_info>
<name>cufft_xp32_32_16.dll</name>
<executable/>
</file_info>
<file_info>
<name>dbhs.dev.win.3d35195e</name>
</file_info>
<file_info>
<name>dbhs.dev.win.3d35195e</name>
</file_info>
<app_version>
<app_name>einsteinbinary_BRP3</app_name>
<version_num>107</version_num>
<platform>windows_intelx86</platform>
<avg_ncpus>0.200000</avg_ncpus>
<max_ncpus>1.000000</max_ncpus>
<plan_class>BRP3cuda32</plan_class>
<api_version>6.13.0</api_version>
<file_ref>
<file_name>einsteinbinary_BRP3_1.07_windows_intelx86__BRP3cud a32.exe</file_name>
<main_program/>
</file_ref>
<file_ref>
<file_name>cudart_xp32_32_16.dll</file_name>
<open_name>cudart32_32_16.dll</open_name>
<copy_file/>
</file_ref>
<file_ref>
<file_name>cufft_xp32_32_16.dll</file_name>
<open_name>cufft32_32_16.dll</open_name>
<copy_file/>
</file_ref>
<file_ref>
<file_name>einsteinbinary_BRP3_1.00_graphics_windows_intelx86 .exe</file_name>
<open_name>graphics_app</open_name>
</file_ref>
<file_ref>
<file_name>dbhs.dev.win.3d35195e</file_name>
<open_name>db.dev</open_name>
<copy_file/>
</file_ref>
<file_ref>
<file_name>dbhs.dev.win.3d35195e</file_name>
<open_name>dbhs.dev</open_name>
<copy_file/>
</file_ref>
<coproc>
<type>CUDA</type>
<count>0.500000</count>
</coproc>
<gpu_ram>220200960.000000</gpu_ram>
</app_version>
<app_version>
<app_name>einstein_S5GC1HF</app_name>
<version_num>306</version_num>
<platform>windows_intelx86</platform>
<plan_class>S5GCESSE2</plan_class>
<api_version>6.13.0</api_version>
<file_ref>
<file_name>einstein_S5GC1HF_3.06_windows_intelx86__S5GCESSE2. exe</file_name>
<main_program/>
</file_ref>
<file_ref>
<file_name>einstein_S5R6_3.01_graphics_windows_intelx86.exe</file_name>
<open_name>graphics_app</open_name>
</file_ref>
</app_version>
</app_info>

This one generated URL errors

7/12/2011 10:08:56 PM [error] No URL for file transfer of einsteinbinary_BRP3_1.07_windows_intelx86__BRP3cud a32.exe
7/12/2011 10:08:56 PM [error] No URL for file transfer of einsteinbinary_BRP3_1.00_graphics_windows_intelx86 .exe
7/12/2011 10:08:56 PM [error] No URL for file transfer of cudart_xp32_32_16.dll
7/12/2011 10:08:56 PM [error] No URL for file transfer of cufft_xp32_32_16.dll
7/12/2011 10:08:56 PM [error] No URL for file transfer of dbhs.dev.win.3d35195e

Then I added URL references to the app_info file like I found in an app_info file posted on E@H only to discover BOINC does not allow URL references in an app_info file.

Gad Zooks Batman. Why can't projects have a section on their forums listing WORKING app_info files? It can't be THAT F'n hard. After all THEY did write the apps.

Fire$torm

07-12-11, 11:09 PM

If I remember correctly, the GPU app still uses a LOT of CPU. And the GPU us under utilized. You will probably be better off crunching with just CPU, and using your GPU elsewhere more productive.

OK, you might be right. though according to the E@H forum post by the project admin(s) they say the CUDA app is 20x faster then the CPU app for the sub-project.

Bryan

07-12-11, 11:18 PM

The easiest way might be to look at the projects "Top Users" and find one that identifies his tasks as anonymous platform and just PM them and ask if they would share their app_info :))

Fire$torm

07-12-11, 11:37 PM

The easiest way might be to look at the projects "Top Users" and find one that identifies his tasks as anonymous platform and just PM them and ask if they would share their app_info :))

Well... DUH!!!! Why didn't you think of that? Oh, wait... you did. :P

OK, smart aleck, will do.

Fire$torm

07-13-11, 12:41 AM

The easiest way might be to look at the projects "Top Users" and find one that identifies his tasks as anonymous platform and just PM them and ask if they would share their app_info :))

Well... DUH!!!! Why didn't you think of that? Oh, wait... you did. :P

OK, smart aleck, will do.

UPDATE.

Well, while looking for posts from one of the top users I found this thread 3WU BRP3cuda on a single GPU. (http://einstein.phys.uwm.edu/forum_thread.php?id=8652&nowrap=true#109236) Within that thread is message 110450. (http://einstein.phys.uwm.edu/forum_thread.php?id=8652&nowrap=true#110450) That post is the first time I have seen someone really SHOW you which sections do what. He posted an update to run version 1.07 of the CUDA app so I will post it here (I have removed the reference to version 1.05).

This is just for the CUDA app.

<app_info>
<app>
<name>einsteinbinary_BRP3</name>
<user_friendly_name>Binary Radio Pulsar Search</user_friendly_name>
</app>
<file_info>
<name>einsteinbinary_BRP3_1.07_windows_intelx86__BRP3cud a32.exe</name>
<executable/>
</file_info>
<file_info>
<name>einsteinbinary_BRP3_1.00_graphics_windows_intelx86 .exe</name>
<executable/>
</file_info>
<file_info>
<name>cudart_xp32_32_16.dll</name>
<executable/>
</file_info>
<file_info>
<name>cufft_xp32_32_16.dll</name>
<executable/>
</file_info>
<file_info>
<name>db.dev.win.3d35195e</name>
</file_info>
<file_info>
<name>dbhs.dev.win.3d35195e</name>
</file_info>
<app_version>
<app_name>einsteinbinary_BRP3</app_name>
<version_num>105</version_num>
<platform>windows_intelx86</platform>
<avg_ncpus>0.200000</avg_ncpus>
<max_ncpus>1.000000</max_ncpus>
<plan_class>BRP3cuda32</plan_class>
<api_version>6.13.0</api_version>
<file_ref>
<file_name>einsteinbinary_BRP3_1.07_windows_intelx86__BRP3cud a32.exe</file_name>
<main_program/>
</file_ref>
<file_ref>
<file_name>cudart_xp32_32_16.dll</file_name>
<open_name>cudart32_32_16.dll</open_name>
<copy_file/>
</file_ref>
<file_ref>
<file_name>cufft_xp32_32_16.dll</file_name>
<open_name>cufft32_32_16.dll</open_name>
<copy_file/>
</file_ref>
<file_ref>
<file_name>einsteinbinary_BRP3_1.00_graphics_windows_intelx86 .exe</file_name>
<open_name>graphics_app</open_name>
</file_ref>
<file_ref>
<file_name>db.dev.win.3d35195e</file_name>
<open_name>db.dev</open_name>
<copy_file/>
</file_ref>
<file_ref>
<file_name>dbhs.dev.win.3d35195e</file_name>
<open_name>dbhs.dev</open_name>
<copy_file/>
</file_ref>
<coproc>
<type>CUDA</type>
<count>0.500000</count>
</coproc>
<gpu_ram>220200960.000000</gpu_ram>
</app_version>
</app_info>

Sorry but I cannot test it now since the Einstein server is blocking my connections. I'll have to wait until the firewall times out my supposed "attack".

Mad Matt

07-13-11, 06:10 PM

OK, you might be right. though according to the E@H forum post by the project admin(s) they say the CUDA app is 20x faster then the CPU app for the sub-project.

If that's right they tuned down GPU credits artificially, but I have some doubts about that. I just made some short efforts with a GTX 295 quite a while ago, and it hardly tops three Quads/i7s, so indeed return is more than lousy compared to usual GPU projects. For me it's still a CPU project and within the 'normal' range it has a quite competitive return for CPUs.

Fire$torm

07-13-11, 08:43 PM

If that's right they tuned down GPU credits artificially, but I have some doubts about that. I just made some short efforts with a GTX 295 quite a while ago, and it hardly tops three Quads/i7s, so indeed return is more than lousy compared to usual GPU projects. For me it's still a CPU project and within the 'normal' range it has a quite competitive return for CPUs.

Ahhhhh. OK. I see. Ya know I have a new term for bad GPU credits ---> The DA Effect <----

Its my thought and I'm sticking to it.........

Bryan

07-14-11, 12:10 AM

Ahhhhh. OK. I see. Ya know I have a new term for bad GPU credits ---> The DA Effect <----

Its my thought and I'm sticking to it.........

Sounds quite intelligent to me! :(

spingadus

12-30-11, 07:43 AM

<app_info>
<app>
<name>einsteinbinary_BRP4</name>
<user_friendly_name>Binary Radio Pulsar Search (Arecibo)</user_friendly_name>
</app>
<file_info>
<name>einsteinbinary_BRP4_1.00_windows_intelx86__BRP3cud a32.exe</name>
<executable/>
</file_info>
<file_info>
<name>cudart_xp32_32_16.dll</name>
<executable/>
</file_info>
<file_info>
<name>cufft_xp32_32_16.dll</name>
<executable/>
</file_info>
<file_info>
<name>db.dev.win.3d35195e</name>
</file_info>
<file_info>
<name>dbhs.dev.win.3d35195e</name>
</file_info>
<app_version>
<app_name>einsteinbinary_BRP4</app_name>
<version_num>100</version_num>
<platform>windows_intelx86</platform>
<avg_ncpus>0.200000</avg_ncpus>
<max_ncpus>1.000000</max_ncpus>
<flops>32706701893.376610</flops>
<plan_class>BRP3cuda32</plan_class>
<api_version>6.13.0</api_version>
<file_ref>
<file_name>einsteinbinary_BRP4_1.00_windows_intelx86__BRP3cud a32.exe</file_name>
<main_program/>
</file_ref>
<file_ref>
<file_name>cudart_xp32_32_16.dll</file_name>
<open_name>cudart32_32_16.dll</open_name>
<copy_file/>
</file_ref>
<file_ref>
<file_name>cufft_xp32_32_16.dll</file_name>
<open_name>cufft32_32_16.dll</open_name>
<copy_file/>
</file_ref>
<file_ref>
<file_name>db.dev.win.3d35195e</file_name>
<open_name>db.dev</open_name>
<copy_file/>
</file_ref>
<file_ref>
<file_name>dbhs.dev.win.3d35195e</file_name>
<open_name>dbhs.dev</open_name>
<copy_file/>
</file_ref>
<coproc>
<type>CUDA</type>
<count>0.500000</count>
</coproc>
<gpu_ram>314572800.000000</gpu_ram>
</app_version>
</app_info>

I created an updated "GPU ONLY" app_info for Einstein. I tested it and it works fine. Just change the gpu_ram line to suit your card.

I did this because I wanted to get my gpu usage % maxed out as it was only around 60%. Seems like a waste of gpu power to run it like that. The app_info will run 2 wu's per gpu. So far, it appears that it is still running at around 60%, so even though I have 2 tasks running at a time, it will now take 2 times as long to finish, so it's a wash.

I have 2 threads free and overall cpu usage is at around 90%, so I have room to spare.

I may fiddle with the following to see if increasing the values helps.
<avg_ncpus>0.200000</avg_ncpus>
<max_ncpus>1.000000</max_ncpus>

Anyone successful in getting Einstein@home to use all of your GPU?

Al

12-30-11, 08:01 AM

I've gotten it up in the mid 80's but it still took twice as long, so I gave up on it.

Slicker

12-30-11, 10:50 AM

One of two things is going on here. Either the Einstein developers are college kids who haven't learned how to code for performance yet, or the app really doesn't lend itself to running on a GPU. You have a choice when programming for the GPU to do either blocking or non-blocking calls. Either way, the CPU has no clue what the GPU is doing when it is doing it. It only knows when it finishes. So, the question is whether the CPU does something else while it waits or whether it blocks all other programs from running while waiting. Choose wrong and the CPU wastes cycles that could have been spent elsewhere. By calculating how long it takes to run a GPU kernel (e.g. run 10 of them and take the average time), once can tell the CPU to "sleep" for that length of time so that it uses virtually no CPU at all. This is even easier though the use of events in OpenCL programming -- while clWaitForEvents(event) sleep(milliseconds).

GPUs are great at parallel tasks. But, every time one of the stream processors has to do something different than all the others (e.g. loop one extra time) then all the others have to wait for that one stream processor to finish. Only when all of them are finished can it move on to the next task. When 40% of the stream processors have to run extra instructions and the other 60% sit idle, you will see the app running at 40% GPU utilization. To fix that, one needs to break the kernels into smaller parts so that for each, all stream processors are 100% utilized. That, however, can't always be done.

One way to look at parallel processing is like this: You have 1000 cars driving from New York to California and all are supposed to arrive at exactly the same time. There is no such thing as a 1000 lane highway. So, it is impossible for all of them to arrive at exactly the same time since, assuming the road is two lanes wide, the cars would be stacked up 500 deep in line to cross the California border. You can attempt to use multiple roads so that there are fewer cars in line, but the logistics of getting all the cars on all the roads to cross at the same time is much more difficult than having to monitor a single road. So, in GPU programming, any time the application goes through a "Do command. OK, but...." that means it has to stop and wait because all streams run the same commands at the same time and none are allowed to jump ahead. If the math problem being solved doesn't lend itself to doing that, the GPU app won't run efficiently.

As I've said more than once before, just because you can use a table knife as a screwdriver, it doesn't mean you should. Just because a GPU can run an app doesn't make it the best use for the GPU. With the heavy CPU utilization and the limited GPU utilization, Einstein is fitting a square peg into a round hole. With a big enough hammer, you can get it to work. The real question is, should they even try?

Fire$torm

12-30-11, 05:03 PM

Thank you Slicker. GPU Programming 101: Introduction. Good stuff Professor Slicker. :-bd

spingadus

12-31-11, 12:07 AM

Yes, thanks Slicker. I enjoyed reading that and feel more educated as well :)

Makes me want to learn a programming language.

Any ideas what would be the ideal language these days to start on if you wanted to eventually create a BOINC project and code for both cpu and gpu?

Slicker

01-03-12, 10:55 AM

Yes, thanks Slicker. I enjoyed reading that and feel more educated as well :)

Makes me want to learn a programming language.

Any ideas what would be the ideal language these days to start on if you wanted to eventually create a BOINC project and code for both cpu and gpu?

For BOINC, you need to know C for both the apps and the server. You really don't need to know php since the web pages are plug and play. For GPU, OpenCL is your best bet as it really just a subset of C.

spingadus

01-03-12, 01:13 PM

For BOINC, you need to know C for both the apps and the server. You really don't need to know php since the web pages are plug and play. For GPU, OpenCL is your best bet as it really just a subset of C.

Thanks, when you say 'C', do you mean C++ as well?

Slicker

01-04-12, 10:21 AM

Thanks, when you say 'C', do you mean C++ as well?

90% C code, 10% C++. Just the basics of C++ though. Coprocessors (e.g. AMD, nVidia, OpenCL) use a base class of COPROC. There are lots of pointers to structures and the use of vectors to iterate through lists of things (WUs, coprocs, etc.).

spingadus

01-04-12, 04:52 PM

90% C code, 10% C++. Just the basics of C++ though. Coprocessors (e.g. AMD, nVidia, OpenCL) use a base class of COPROC. There are lots of pointers to structures and the use of vectors to iterate through lists of things (WUs, coprocs, etc.).

Thanks, I'm going to start brushing up on some C and see where it goes. It'll probably be a while before I gain any type of proficiency, but here's to another hobby.

Fire$torm

01-04-12, 10:16 PM

Thanks, I'm going to start brushing up on some C and see where it goes. It'll probably be a while before I gain any type of proficiency, but here's to another hobby.

Cheers.