zombie67
11-07-11, 07:25 PM
...at least in the sort term...whatever that means.
Here are some messages from the boinc-dev mail list. Some of the project admins are not happy.
From: Richard Haselgrove
Subject: Re: [boinc_dev] APR, DCF and non-deterministic projects
Date: November 7, 2011 10:25:04 AM PST
Part 2 of this research: Credit
Because the NumberFields tasks are so variable (from 2 ro 300,000 seconds), I've been looking at the rate that credit has been awarded - Credits per hour (of runtime) gives a nice human-scale value.
Here are the results for my four hosts at NumberFields. All four were attached at the same time (note the consecutive HostIDs): in particular, the two Q6600 hosts have identical hardware.
http://img46.imageshack.us/img46/826/numberfieldscreditperho.png
The problem is that the rate at which credit is granted depends critically on the host APR value. With a non-deterministic project, especially in the early days after attachment, APR is heavily influenced by the random processing time of the early tasks. The credit/hour for all hosts were tightly grouped for the first 10 tasks, when APR is effectively ignored, but thereafter they diverge spectacularly. Hosts 1288 and 1289 happened to get short tasks first, so APR was artificially high when it was first used for credit calculation: hosts 1290 and 1291 happened to draw longer-running tasks.
It was host 1290 which received the 300,000 second task (http://numberfields.asu.edu/NumberFields/result.php?resultid=292439), and was awarded 4,500 credits (in round figures). At very much the same time, the identical host 1291 returned http://numberfields.asu.edu/NumberFields/result.php?resultid=317447, getting almost the same credit for just 43,000 seconds of work.
It's discrepancies like that which lead users, and project administrators, to distrust CreditNew. I think it needs more work, especially if BOINC is going to continue to support non-deterministic projects.
From: David Anderson
Subject: Re: [boinc_dev] APR, DCF and non-deterministic projects
Date: November 7, 2011 11:12:12 AM PST
The goals of CreditNew involve long-term averages.
It makes no promises about individual jobs or about credit/hour.
If a project has highly variable jobs,
this translates into highly variable credit for individual jobs.
But the long-term average stuff should still hold.
If anyone has a specific suggestion for how to make credit
less variable on the short term, while still preserving the
long-term goals, let me know.
BTW: the server maintains the variance of elapsed time
and turnaround on a (host, app version) basis.
For everything else it maintains only the mean.
Variance isn't currently used for anything.
-- David
From: Travis Desell
Subject: Re: [boinc_dev] APR, DCF and non-deterministic projects
Date: November 7, 2011 10:53:51 AM PST
We've had similar issues with the n-body simulations we're doing on milkyway@home. The runtime of the workunits is fairly dependent on the initial random distribution of bodies, and the runtimes can vary from a couple hours to a couple days.
It would be nice to get away from having to specify the RSC_FPOPS_EST for each workunit, especially as in our case it can cause workunits to be terminated prematurely by the clients. I don't suppose you've run into this problem as well?
From: Janus Kristensen
Subject: Re: [boinc_dev] APR, DCF and non-deterministic projects
Date: November 7, 2011 2:36:54 PM PST
Joining in to say that the BURP based projects sometimes experience this
exact issue as well. The setup is similar to the other projects
mentioned in this thread:
1) fpops estimate cannot be given in advance but has to be fixed to
something (in our case this is roughly equal to 1 CPU hour since that is
a reasonable average)
2) Workunit run time varies randomly in unpredictable ways for the same
app and even for a single series of workunits (in our case typically
from 5 secs to 5 days)
3) Credit is granted in a way that - to the users at least - seems to be
based on luck. Sometimes it is off by a factor of around 10 or more.
If a project has highly variable jobs, this translates into highly variable credit
And this is probably what Richard is pointing out as the core of the
problem. People see this as a flaw in the credit system - and I have to
agree to some extent that credit consistency within a single project at
any given time is a very important treat in a credit system.
Long term and even cross-project credit stability is desirable, but if
it comes at the cost of short term credit stability then we have to
consciously weigh them against each other and figure out what is more
important to us.
People perceive instability (be that short or long term) as "unfair".
If anyone has a specific suggestion for how to make credit less
variable on the short term, while still preserving the long-term goals,
let me know.
What if it turns out that these two things are inherently opposites? Can
we ignore the problem?
I'm really feeling lost on this issue. The very old "fpops*cpu_secs +/-
correction"-scheme had the interesting property that apart from the
difficulty in measuring the fpops capability of hosts it was at least
somewhat stable for each project.
I'm afraid I cannot provide the magical solution, though. Maybe it is
somewhere in the middle between what we have now and what we had back in
the very beginning.
-- Janus
Here are some messages from the boinc-dev mail list. Some of the project admins are not happy.
From: Richard Haselgrove
Subject: Re: [boinc_dev] APR, DCF and non-deterministic projects
Date: November 7, 2011 10:25:04 AM PST
Part 2 of this research: Credit
Because the NumberFields tasks are so variable (from 2 ro 300,000 seconds), I've been looking at the rate that credit has been awarded - Credits per hour (of runtime) gives a nice human-scale value.
Here are the results for my four hosts at NumberFields. All four were attached at the same time (note the consecutive HostIDs): in particular, the two Q6600 hosts have identical hardware.
http://img46.imageshack.us/img46/826/numberfieldscreditperho.png
The problem is that the rate at which credit is granted depends critically on the host APR value. With a non-deterministic project, especially in the early days after attachment, APR is heavily influenced by the random processing time of the early tasks. The credit/hour for all hosts were tightly grouped for the first 10 tasks, when APR is effectively ignored, but thereafter they diverge spectacularly. Hosts 1288 and 1289 happened to get short tasks first, so APR was artificially high when it was first used for credit calculation: hosts 1290 and 1291 happened to draw longer-running tasks.
It was host 1290 which received the 300,000 second task (http://numberfields.asu.edu/NumberFields/result.php?resultid=292439), and was awarded 4,500 credits (in round figures). At very much the same time, the identical host 1291 returned http://numberfields.asu.edu/NumberFields/result.php?resultid=317447, getting almost the same credit for just 43,000 seconds of work.
It's discrepancies like that which lead users, and project administrators, to distrust CreditNew. I think it needs more work, especially if BOINC is going to continue to support non-deterministic projects.
From: David Anderson
Subject: Re: [boinc_dev] APR, DCF and non-deterministic projects
Date: November 7, 2011 11:12:12 AM PST
The goals of CreditNew involve long-term averages.
It makes no promises about individual jobs or about credit/hour.
If a project has highly variable jobs,
this translates into highly variable credit for individual jobs.
But the long-term average stuff should still hold.
If anyone has a specific suggestion for how to make credit
less variable on the short term, while still preserving the
long-term goals, let me know.
BTW: the server maintains the variance of elapsed time
and turnaround on a (host, app version) basis.
For everything else it maintains only the mean.
Variance isn't currently used for anything.
-- David
From: Travis Desell
Subject: Re: [boinc_dev] APR, DCF and non-deterministic projects
Date: November 7, 2011 10:53:51 AM PST
We've had similar issues with the n-body simulations we're doing on milkyway@home. The runtime of the workunits is fairly dependent on the initial random distribution of bodies, and the runtimes can vary from a couple hours to a couple days.
It would be nice to get away from having to specify the RSC_FPOPS_EST for each workunit, especially as in our case it can cause workunits to be terminated prematurely by the clients. I don't suppose you've run into this problem as well?
From: Janus Kristensen
Subject: Re: [boinc_dev] APR, DCF and non-deterministic projects
Date: November 7, 2011 2:36:54 PM PST
Joining in to say that the BURP based projects sometimes experience this
exact issue as well. The setup is similar to the other projects
mentioned in this thread:
1) fpops estimate cannot be given in advance but has to be fixed to
something (in our case this is roughly equal to 1 CPU hour since that is
a reasonable average)
2) Workunit run time varies randomly in unpredictable ways for the same
app and even for a single series of workunits (in our case typically
from 5 secs to 5 days)
3) Credit is granted in a way that - to the users at least - seems to be
based on luck. Sometimes it is off by a factor of around 10 or more.
If a project has highly variable jobs, this translates into highly variable credit
And this is probably what Richard is pointing out as the core of the
problem. People see this as a flaw in the credit system - and I have to
agree to some extent that credit consistency within a single project at
any given time is a very important treat in a credit system.
Long term and even cross-project credit stability is desirable, but if
it comes at the cost of short term credit stability then we have to
consciously weigh them against each other and figure out what is more
important to us.
People perceive instability (be that short or long term) as "unfair".
If anyone has a specific suggestion for how to make credit less
variable on the short term, while still preserving the long-term goals,
let me know.
What if it turns out that these two things are inherently opposites? Can
we ignore the problem?
I'm really feeling lost on this issue. The very old "fpops*cpu_secs +/-
correction"-scheme had the interesting property that apart from the
difficulty in measuring the fpops capability of hosts it was at least
somewhat stable for each project.
I'm afraid I cannot provide the magical solution, though. Maybe it is
somewhere in the middle between what we have now and what we had back in
the very beginning.
-- Janus