PDA

View Full Version : WuProp build finished



Bryan
07-25-16, 10:14 PM
I've finished my WuProp machine. It should do pretty good on hours and produce a few credits while it's at it :D

2474

2x E5-2695 V4
36 cores/ 72 threads
2.3GHz
128G ECC RAM

Performance - CSG and SRBase

Identical credits of 4 each I7-3930K @ 4.2G (both std and AVX)
Equal to 2.6 24 thread X5675 Xeon @ 3.06G (CSG)
Equal to 5.3 24 thread X5675 Xeon on AVX project (SRBase)

This boy is a happy camper :))

BTW, a lesson learned. Windows only supports 64 threads because of the new memory standard NUMA. Linux will run all 72 threads so this box is destined to spend a lot of its life on Linux.

zombie67
07-25-16, 11:55 PM
Sweeeeeeeeet!!!!

About the 64 thread on win. Got a link about that? I need to learn more.

Edit: Which version of windows?

Bryan
07-26-16, 12:08 AM
All versions ... maybe not on server OS. Just do a search on "Windows NUMA". If you were generating code you could setup each processor as a different node and have 64 thread support for each node. Then you could theoretically write the BOINC code and assign threads to the different nodes. Applications can be written that are NUMA aware and they overcome the 64 thread limitation.

zombie67
07-26-16, 12:11 AM
Hrm. According to this, win7 supports up to 256:

https://support.microsoft.com/en-us/help/10737/windows-7-system-requirements



PCs with multi-core processors:

Windows 7 was designed to work with today's multi-core processors. All 32-bit versions of Windows 7 can support up to 32 processor cores, while 64‑bit versions can support up to 256 processor cores.

I assume win10 supports at least this many.

Edit: I think NUMA is a "nice to have". Otherwise it drops back to normal SMP mode. Have you tried just running it in windows to see if it sees all threads?

Bryan
07-26-16, 12:33 AM
It sees all 72 threads Z, count the little boxes in the 1st post. If you are running BTasks and start up 64 threads each and every one of the little suckers will show 100% CPU usage. If you enable the 65th thread then you will see a couple drop from 100% utilization. Go to 72 WU and more of them will drop below 100%. Windows assigns the extras (above 64) to the same 64 ... it doesn't add more threads it just starts sharing the 64. The CPU loading changes very little from 64 to 72 threads. The machine will never get to 100% loading.

On Linux all 72 threads will show 100% CPU usage.

Versions of Windows prior to Win7 do not support NUMA. W10 also supports 64 threads. There is probably no reason to do more since I imagine only applications running on servers would be written to be NUMA aware. How many people use desktop PCs w/ more than 64 threads?

Bryan
07-26-16, 01:00 AM
@Z, I'm going to have to do more investigation. What I told you is what I was told before I put together this machine. I did see the BTasks behavior that I was told about. But I started thinking and the high definition screen shot of the 1st post I can see the processor is showing 98% utilization and all 72 threads are active. That was taken running Universe.

It may be that when I saw BTasks start sharing threads I was running 72 threads of VDW (1.5G/thread). Maybe when you go beyond a certain memory threshold then it starts cutting back.

Anyway something isn't adding up! If I get a chance I'll play with it tomorrow and let you know.

EDIT: all 72 threads are active but NOT all of them are pegged at 100% utilization as I would expect them to be.

zombie67
07-26-16, 01:09 AM
I wonder if the type of task makes a difference. Is there a very low memory-usage task to try? Maybe something from PG? wuprop stats will show a good app to choose.

Bryan
07-26-16, 01:16 AM
That could well be the key Z. I'll do some playing with it when I get a chance. I've just been content to run it on Linux but I do need to know the answer when something comes along and I have to run Windows. I'll see what it does with a itty bitty one and then switch to a memory hog like VDW. I'll also compare what Tasks Manager is showing versus what BTasks is showing.

BTW, it appears my old stand-by ProLasso only supports 36 threads as does Core Temp and VBox. Of course on VBox you can always run 2 36 thread VMs so it certainly doesn't create any limitation.

zombie67
07-26-16, 01:22 AM
The memory usage in BT is not reliable, IMO. For example, it never shows the actual usage when tasks run in VMs. Again, just my opinion based on my observations.

I will be very interested in the results of your future experiments!

Oh! Also, if you are using dual DIMMs per channel, memory speed slows down. And that could be another factor of under-utilized CPUs.

John P. Myers
07-26-16, 01:31 AM
The 64th thread is what mattwrs. Go past that and Numa kicks in

Bryan
07-26-16, 01:49 AM
The 64th thread is what matters. Go past that and Numa kicks in

Well I'm sure glad I didn't build one of the new 44C/88T machines. Besides having to hock my soul to pay for it I would have really been PO'd when I couldn't use all of those suckers =))

@Z, I used 8x16G memory sticks so there is 1 per memory channel.

Since JPM weighed in then that is the definitive answer. I won't argue with that boy :D

zombie67
07-26-16, 02:15 AM
The 64th thread is what mattwrs. Go past that and Numa kicks in

Okay. But what does that mean exactly? You need to configure DIMMs differently? Run dual VMs of windows somehow? I read the wiki about NUMA, but it really doesn't explain the part about how you implement it, or work around it, or turn it off.

zombie67
07-26-16, 02:27 AM
NUMA is not limited to Windows.

http://man7.org/linux/man-pages/man7/numa.7.html

John P. Myers
07-26-16, 02:30 AM
When NUMA kicks in, it assigns groups of 64 threads to a node (by default). You can have more than 1 node going, but the way it works is that a program that launches another program can only launch that other program into the same node it resides in, meaning BOINC cannot use more than 1 node. It can see all threads over 64, but can't assign WUs to them.

If using Windows, a VM would be the way to go. Or you could turn HT off. Or figure out how to run 2 instances of BOINC at the same time, assigning 1 to node 1 while the default remains in node 0. Or beg the BOINC devs to code for more than 1 node.

Or just use Linux ;)