admin 管理员组文章数量: 887169
2023年12月24日发(作者:安卓编程开发工具)
Definition
Formally, we begin by considering some family of distributions for a
random variable
X, that is indexed by some θ.
More intuitively, we can think of
X as our "data", perhaps
, where i.i.d. The
X is the set of things the
decision rule will be making decisions on. There exists some number of
possible ways to model our data
X, which our decision function can use
to make decisions. For a finite number of models, we can thus think of
θ as the
index to this family of probability models. For an infinite
family of models, it is a set of parameters to the family of distributions.
On a more practical note, it is important to understand that, while it
is tempting to think of loss functions as necessarily parametric (since
they seem to take θ as a "parameter"), the fact that θ is
non-finite-dimensional is completely incompatible with this notion; for
example, if the family of probability functions is uncountably infinite,
θ indexes an uncountably infinite space.
From here, given a set
A of possible actions, a decision rule is a function
δ : →
A.
A loss function is a real lower-bounded function
L on Θ ×
A for some
θ ∈
Θ. The value
L(θ, δ(X)) is the
cost of action δ(X) under
parameter θ.[1]
[edit] Decision rules
A decision rule makes a choice using an optimality criterion. Some
commonly used criteria are:
Minimax: Choose the decision rule with the lowest worst loss — that
is, minimize the worst-case (maximum possible) loss:
Invariance: Choose the optimal decision rule which satisfies an
invariance requirement.
Choose the decision rule with the lowest average loss (i.e. minimize
the expected value of the loss function):
[edit] Expected loss
The value of the loss function itself is a random quantity because it
depends on the outcome of a random variable
X. Both frequentist and
Bayesian statistical theory involve making a decision based on the
expected value of the loss function: however this quantity is defined
differently under the two paradigms.
[edit] Frequentist risk
Main article: risk function
The expected loss in the frequentist context is obtained by taking the
expected value with respect to the probability distribution,
Pθ, of the
observed data,
X. This is also referred to as the risk function[2] of the
decision rule δ and the parameter θ. Here the decision rule depends on
the outcome of
X. The risk function is given by
[edit] Bayesian expected loss
In a Bayesian approach, the expectation is calculated using the posterior
distribution π* of the parameter θ:
.
One then should choose the action
a* which minimises the expected loss.
Although this will result in choosing the same action as would be chosen
using the Bayes risk, the emphasis of the Bayesian approach is that one
is only interested in choosing the optimal action under the actual
observed data, whereas choosing the actual Bayes optimal decision rule,
which is a function of all possible observations, is a much more difficult
problem.
[edit] Selecting a loss function
Sound statistical practice requires selecting an estimator consistent
with the actual loss experienced in the context of a particular applied
problem. Thus, in the applied use of loss functions, selecting which
statistical method to use to model an applied problem depends on knowing
the losses that will be experienced from being wrong under the problem's
particular circumstances, which results in the introduction of an element
of teleology into problems of scientific decision-making.
A common example involves estimating "location." Under typical
statistical assumptions, the mean or average is the statistic for
estimating location that minimizes the expected loss experienced under
the Taguchi or squared-error loss function, while the median is the
estimator that minimizes expected loss experienced under the
absolute-difference loss function. Still different estimators would be
optimal under other, less common circumstances.
In economics, when an agent is risk neutral, the loss function is simply
expressed in monetary terms, such as profit, income, or end-of-period
wealth.
But for risk averse (or risk-loving) agents, loss is measured as the
negative of a utility function, which represents satisfaction and is
usually interpreted in ordinal terms rather than in cardinal (absolute)
terms.
Other measures of cost are possible, for example mortality or morbidity
in the field of public health or safety engineering.
For most optimization algorithms, it is desirable to have a loss function
that is globally continuous and differentiable.
Two very commonly-used loss functions are the squared loss,
and the absolute loss,
,
. However the absolute loss has the
disadvantage that it is not differentiable at . The squared loss
has the disadvantage that it has the tendency to be dominated by
outliers---when summing over a set of 's (as in ), the final
sum tends to be the result of a few particularly-large a-values, rather
than an expression of the average a-value.
[edit] Loss functions in Bayesian statistics
One of the consequences of Bayesian inference is that in addition to
experimental data, the loss function does not in itself wholly determine
a decision. What is important is the relationship between the loss
function and the prior probability. So it is possible to have two different
loss functions which lead to the same decision when the prior probability
distributions associated with each compensate for the details of each loss
[citation needed]function.
Combining the three elements of the prior probability, the data, and the
loss function then allows decisions to be based on maximizing the
subjective expected utility, a concept introduced by Leonard J.
Savage.[citation needed]
[edit] Regret
Main article: Regret (decision theory)
Savage also argued that using non-Bayesian methods such as minimax, the
loss function should be based on the idea of
regret, i.e., the loss
associated with a decision should be the difference between the
consequences of the best decision that could have been taken had the
underlying circumstances been known and the decision that was in fact
taken before they were known.
[edit] Quadratic loss function
The use of a quadratic loss function is common, for example when using
least squares techniques or Taguchi methods. It is often more
mathematically tractable than other loss functions because of the
properties of variances, as well as being symmetric: an error above the
target causes the same loss as the same magnitude of error below the target.
If the target is
t, then a quadratic loss function is
for some constant
C; the value of the constant makes no difference to a
decision, and can be ignored by setting it equal to 1.
Many common statistics, including t-tests, regression models, design of
experiments, and much else, use least squares methods applied using linear
regression theory, which is based on the quadratric loss function.
The quadratic loss function is also used in linear-quadratic optimal
control problems.
[edit] 0-1 loss function
In statistics and decision theory, a frequently used loss function is the
0-1 loss function
where is the indicator notation.
版权声明:本文标题:损失函数(0-1) 内容由网友自发贡献,该文观点仅代表作者本人, 转载请联系作者并注明出处:http://www.freenas.com.cn/free/1703426926h450887.html, 本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容,一经查实,本站将立刻删除。
发表评论