跳至主要内容

Bayesian Estimation & Minimax Estimation

損失函數

之前提到過,我們的目標是針對 η(θ)\eta(\theta) 找到一個好的點估計。而我們用 MSE(δ,θ)Eθ[(δ(X~)η(θ))2]MSE(\delta, \theta)\triangleq E_\theta[(\delta(\utilde{X})-\eta(\theta))^2] 來衡量一個點估計的好壞。但對於不同的情景和需求,我們可能會需要不同的損失函數,所以我們需要一個更一般的定義。

Definition

Define a loss function L(δ,θ)L(\delta,\theta) for η(θ)\eta(\theta) $ with

  1. L(δ(X~,θ))0,x~,θL(\delta(\utilde{X},\theta))\ge 0, \forall \utilde{x}, \theta
  2. L(η(θ),θ)=0,θL(\eta(\theta), \theta)=0, \forall\theta

E.g.

  • L(δ,θ)=(δθ)2L(\delta, \theta)=(\delta-\theta)^2
  • L(δ,θ)=δθL(\delta, \theta)=|\delta-\theta|
  • L(δ,θ)=w(θ)δη(θ)k,k>0,w(θ)>0L(\delta, \theta)=w(\theta)|\delta-\eta(\theta)|^k, \forall k>0, w(\theta)>0, 其中 w(θ)w(\theta) 是一個權重函數(weight function)

一個估計方法得到的損失函數的值越小,代表這個估計方法越好。但 L(δ(X~),θ)L(\delta(\utilde{X}),\theta) 沒有辦法比較,因為我們每次獲得的數據都不一樣,因此我們就計算損失函數的期望值,來獲得這個估計方法平均而言的表現。

風險函數

Definition

The risk function of δ\delta is defined as

R(δ,θ)Eθ[L(δ(X~),θ)]R(\delta, \theta)\triangleq E_\theta[L(\delta(\utilde{X}), \theta)]

因此好的估計方法,可以得到較小的風險函數。但與 MSEMSE 一樣,我們同樣無法得到一個可以讓 R(δ,θ)R(\delta, \theta) 最小的估計方法。

在比較不同風險函數時,對於不同的 θ\theta,更好的風險函數可能是不同的,所以我們希望用一個數字來概括 R(δ,θ)R(\delta, \theta),而數字最小的那個就是最好的估計方法。

Q: 如何概括 R(δ,θ)R(\delta, \theta)

  1. 我們比較所有風險函數的最大值,取最小的那個作為 δ\delta^*

    i.e. R(δ,θ)=infδ[supθΩR(δ,θ)]R(\delta^*, \theta)=\inf_\delta[\sup_{\theta\in\Omega}R(\delta, \theta)]

    我們稱這個估計方法為 minmax est.

  2. 在一些情景中,θ\theta 在某些區域內發生的幾率更高,而在其他地方不怎麼發生。此時我們需要的估計方式是在這些區域內表現得更好。因此我們需要對這些區間進行加權。這就是 Bayesian Estimation 的核心思想。

Bayes estimator

先驗分佈

Definition

Bayes estimator:

π(θ)\pi(\theta): Ω\Omega 上的先驗分佈(prior distribution),是一個 pdf。

δπ(X~)\delta_\pi(\utilde{X}) is called the Bayes estimator of η(θ)\eta(\theta) iff rπ(δπ)rπ(δ),δr_\pi(\delta_\pi)\le r_\pi(\delta), \forall \delta, where

rπ(δ)ΩR(δ,θ)π(θ)dθr_\pi(\delta)\triangleq \int_\Omega R(\delta, \theta)\pi(\theta)d\theta

is called the Bayes risk of δ\delta.

Definiton

δ0(X~)\delta_0(\utilde{X}) is admissible     ∄δ\iff \not \exist \delta s.t. δ\delta dom δ0\delta_0, i.e. R(δ,θ)R(δθ,θ),θR(\delta, \theta)\le R(\delta_\theta, \theta), \forall \theta and

δ\delta 不比 δ\delta 糟糕,在某些方面還比 δ0\delta_0

R(δ,θ)R(δθ,θ)θR(δ,θ)<R(δθ,θ)some δ\begin{align*} &R(\delta, \theta) \le R(\delta_\theta, \theta) \quad \forall \theta \\ &R(\delta, \theta) < R(\delta_\theta, \theta) \quad \text{some } \delta \end{align*}
Thorem

Any unique Bayes est is adm

Proof:let δπ\delta_\pi be the unique Bayes est with respect to prior π\pi

Suppose that δπ\delta_\pi is Not adm, i.e. δ\exist\delta s.t.

R(δ,θ)R(δθ,θ)θR(δ,θ)<R(δθ,θ)some δ\begin{align*} &R(\delta, \theta) \le R(\delta_\theta, \theta) \quad \forall \theta \\ &R(\delta, \theta) < R(\delta_\theta, \theta) \quad \text{some } \delta \end{align*}

δπ\because \delta_\pi is Bayes est     ΩR(δπ,θ)π(θ)dθΩR(δ,θ)π(θ)dθ    rπ(δπ)rπ(δ)\implies \int_\Omega R(\delta_\pi, \theta)\pi(\theta)d\theta \le \int_\Omega R(\delta, \theta)\pi(\theta)d\theta \iff r_\pi(\delta_\pi)\le r_\pi(\delta)

δ\because \delta dom δπ    \delta_\pi \implies rπ(δ)=rπ(δπ)r_\pi(\delta)=r_\pi(\delta_\pi), i.e. δ\delta is also Bayes est and δδπ\delta \neq \delta_\pi

But δπ\delta\pi is unique, contradiction.


EX: Let π(θ)=P(θ=θc)=1\pi(\theta)=P(\theta=\theta_c)=1, with θcΩ\theta_c\in\Omega given, i.e. 認爲 θ\theta 一定會是 θc\theta_c

    \implies Bayes risk rπ(δ)ΩR(δ,θ)π(θ)dθ=R(δ,θc)r_\pi(\delta)\triangleq\int_\Omega R(\delta, \theta)\pi(\theta)d\theta=R(\delta, \theta_c)

    \implies Bayes est w.r.t. π\pi is to minR(δ,θc)    minEθc[L(δ(X~),θc)]\min R(\delta, \theta_c)\iff \min E_{\theta_c}[L(\delta(\utilde{X}), \theta_c)]

    δπ(X~)=η(θc)\implies \delta_\pi(\utilde{X})=\eta(\theta_c) is admissible 。既然認爲 θ\theta 一定會是 θc\theta_c,那麽就用 θc\theta_c 作爲 θ\theta 的估計。


Now let L(δ,θ)=w(θ)(δ(X~)η(θ))2L(\delta, \theta)=w(\theta)(\delta(\utilde{X})-\eta(\theta))^2 and given a prior dist (pdf) π(θ)\pi(\theta)

Q: 如何計算 η(θ)\eta(\theta) 關於 π(θ)\pi(\theta) 的 Bayes est?

i.e. to min the rπ(δ)r_\pi(\delta)

    rπ(δ)=ΩR(δ,θ)π(θ)dθ=ΩEθ[L(δ(x~),θ)]π(θ)dθ=ΩRnL(δ(x~),θ)f(x~;θ)dx~π(θ)dθ=FubiniRnΩw(θ)[δ(x~)η(θ)]2f(x~;θ)π(θ)dθdx~\begin{align*} \implies r_\pi(\delta) =& \int_\Omega R(\delta, \theta)\pi(\theta) d\theta\\ =&\int_\Omega E_\theta[L(\delta(\utilde{x}), \theta)]\pi(\theta)d\theta\\ =&\int_\Omega\int_{\R^n} L(\delta(\utilde{x}), \theta)f(\utilde{x};\theta)d\utilde{x}\cdot\pi(\theta)d\theta\\ \xlongequal{\text{Fubini}}&\int_{\R^n}\int_\Omega w(\theta)[\delta(\utilde{x})-\eta(\theta)]^2f(\utilde{x};\theta)\pi(\theta)d\theta d\utilde{x} \end{align*}

為了找到 rπ(δ)r_\pi(\delta) 的最小值,我們只需要找到裡面積分的最小值,因此要找到他微分是 0 的點。

ddθΩw(θ)[δ(x~)η(θ)]2f(x~;θ)π(θ)dθ=0 \frac{d}{d\theta} \int_\Omega w(\theta)[\delta(\utilde{x})-\eta(\theta)]^2f(\utilde{x};\theta)\pi(\theta)d\theta = 0 2Ωw(θ)[δ(x~)η(θ)]f(x~;θ)π(θ)dθ=02\int_\Omega w(\theta)[\delta(\utilde{x})-\eta(\theta)]f(\utilde{x};\theta)\pi(\theta)d\theta = 0 Ωw(θ)δ(x~)f(x~;θ)π(θ)dθ=Ωw(θ)η(θ)f(x~;θ)π(θ)dθ\int_\Omega w(\theta)\delta(\utilde{x})f(\utilde{x};\theta)\pi(\theta)d\theta = \int_\Omega w(\theta)\eta(\theta)f(\utilde{x};\theta)\pi(\theta)d\theta δ=Ωw(θ)η(θ)f(x~;θ)π(θ)dθΩw(θ)f(x~;θ)π(θ)dθ\delta = \frac{\int_\Omega w(\theta)\eta(\theta)f(\utilde{x};\theta)\pi(\theta)d\theta}{\int_\Omega w(\theta)f(\utilde{x};\theta)\pi(\theta)d\theta}

因此,η(θ)\eta(\theta) 的 Bayes est 是:

δπ(X~)=Ωw(θ)η(θ)f(X~;θ)π(θ)dθΩw(θ)f(X~;θ)π(θ)dθ\delta_\pi(\utilde{X}) = \frac{\int_\Omega w(\theta)\eta(\theta)f(\utilde{X};\theta)\pi(\theta)d\theta}{\int_\Omega w(\theta)f(\utilde{X};\theta)\pi(\theta)d\theta}

後驗分佈

之前說 X~f(x~;θ)\utilde{X}\sim f(\utilde{x};\theta) 實際上是在給定 θ\theta 的情況下 XX 的 pdf。但我們現在認為 θ\theta 同樣也是一個隨機變量,所以準確的寫法是 X~θf(x~;θ)\utilde{X}|_\theta\sim f(\utilde{x};\theta)

如果 π(θ)\pi(\theta)θ\thetaΩ\Omega 上的一個先驗分佈,就是在看到數據之前,對 θ\theta 做主觀的猜測,那麽 f(x~;θ)π(θ)f(\utilde{x};\theta)\pi(\theta) 就是 X~\utilde{X}θ\theta 的聯合分佈。

X~\utilde{X} 的邊際密度函數就是:

mπ(x~)Ωf(x~;θ)π(θ)dθm_\pi(\utilde{x})\triangleq\int_\Omega f(\utilde{x};\theta)\pi(\theta)d\theta

根據這個,我們就可以算出給定數據 X~=x~\utilde{X}=\utilde{x} 是,θ\theta 的條件分佈:

π(θx~)=f(x~;θ)π(θ)mπ(x~)\pi(\theta|\utilde{x}) = \frac{f(\utilde{x};\theta)\pi(\theta)}{m_\pi(\utilde{x})}

這是我們得到數據後,對 θ\theta 推測的改進,就是後驗(posterior)分佈。

Definition

X1,,Xniidf(x~;θ)X_1,\cdots, X_n \stackrel{\text{iid}}{\sim} f(\utilde{x};\theta), and π(θ)\pi(\theta) is the prior dist of θ\theta

The posterior dist of θ\theta given X~=x~\utilde{X}=\utilde{x} is

π(θx~)=f(x~;θ)π(θ)mπ(x~)\pi(\theta|\utilde{x}) = \frac{f(\utilde{x};\theta)\pi(\theta)}{m_\pi(\utilde{x})}

where mπ(x~)Ωf(x~;θ)π(θ)dθm_\pi(\utilde{x})\triangleq\int_\Omega f(\utilde{x};\theta)\pi(\theta)d\theta is the marginal dist of x~\utilde{x}


Note: Suppose that T=T(X~)T=T(\utilde{X})θ\theta 的充分統計量,那麼根據分割定理,我們可以得到 f(x~;θ)=g(t;θ)h(x~)f(\utilde{x};\theta)=g(t;\theta)h(\utilde{x}),這裡我們可以通過調整係數的方式,讓 g(t;θ)g(t;\theta) 是一個 pdf。則我們有:

π(θx~)=f(x~;θ)π(θ)Ωf(x~;θ)π(θ)dθ=h(x~)g(t;θ)π(θ)h(x~)Ωg(t;θ)π(θ)dθ=g(t;θ)π(θ)Ωg(t;θ)π(θ)dθ=π(θt)\pi(\theta|\utilde{x}) = \frac{f(\utilde{x};\theta)\pi(\theta)}{\int_\Omega f(\utilde{x};\theta)\pi(\theta)d\theta}=\frac{h(\utilde{x})g(t;\theta)\pi(\theta)}{h(\utilde{x})\int_\Omega g(t;\theta)\pi(\theta)d\theta}=\frac{g(t;\theta)\pi(\theta)}{\int_\Omega g(t;\theta)\pi(\theta)d\theta}=\pi(\theta|t)

在 loss function 是 L(δ,θ)=w(θ)(δ(X~)η(θ))2L(\delta, \theta)=w(\theta)(\delta(\utilde{X})-\eta(\theta))^2 的情況下,並且我們已經拿到數據 X~=x~\utilde{X}=\utilde{x},我們可以得到:

δπ(x~)=Ωw(θ)η(θ)f(x~;θ)π(θ)dθΩw(θ)f(x~;θ)π(θ)dθ=Ωw(θ)η(θ)f(x~;θ)π(θ)mπ(x~)dθΩw(θ)f(x~;θ)π(θ)mπ(x~)dθ=Ωw(θ)η(θ)π(θx~)dxΩw(θ)π(θx~)dθ=E[w(θ)η(θ)x~]E[w(θ)x~]=E[w(θ)η(θ)t]E[w(θ)t]where t=T(x~) is suff. for θ\begin{align*} \delta_\pi(\utilde{x})&=\frac{\int_\Omega w(\theta)\eta(\theta)f(\utilde{x};\theta)\pi(\theta)d\theta}{\int_\Omega w(\theta)f(\utilde{x};\theta)\pi(\theta)d\theta}\\ &=\frac{\int_\Omega w(\theta)\eta(\theta)\frac{f(\utilde{x};\theta)\pi(\theta)}{m_\pi(\utilde{x})}d\theta}{\int_\Omega w(\theta)\frac{f(\utilde{x};\theta)\pi(\theta)}{m_\pi(\utilde{x})}d\theta}\\ &=\frac{\int_\Omega w(\theta)\eta(\theta)\pi(\theta|\utilde{x})dx}{\int_\Omega w(\theta)\pi(\theta|\utilde{x})d\theta}=\frac{E[w(\theta)\eta(\theta)|\utilde{x}]}{E[w(\theta)|\utilde{x}]}\\ &=\frac{E[w(\theta)\eta(\theta)|t]}{E[w(\theta)|t]} \qquad \text{where } t=T(\utilde{x}) \text{ is suff. for } \theta \end{align*}

也就是說:

δπ(X~)=E[w(θ)η(θ)X~]E[w(θ)X~]=E[w(θ)η(θ)T]E[w(θ)T]=w(θ)=1E[η(θ)T]\delta_\pi(\utilde{X})=\frac{E[w(\theta)\eta(\theta)|\utilde{X}]}{E[w(\theta)|\utilde{X}]}=\frac{E[w(\theta)\eta(\theta)|T]}{E[w(\theta)|T]}\xlongequal{w(\theta)=1}E[\eta(\theta)|T]

因此在 MSE 作為風險函數時,Bayes est 沒有辦法再通過 Rao-Blackwell Theorem 進行改進,因為 Bayes est 已經是充分統計量的函數。

Theorem
  1. Under loss function L(δ,θ)=w(θ)(δ(X~)η(θ))2L(\delta, \theta)=w(\theta)(\delta(\utilde{X})-\eta(\theta))^2, the unique Bayes est with respect to prior π\pi is δπ(X~)=E[w(θ)η(θ)X~]E[w(θ)X~]=E[w(θ)η(θ)T]E[w(θ)T]\delta_\pi(\utilde{X})=\frac{E[w(\theta)\eta(\theta)|\utilde{X}]}{E[w(\theta)|\utilde{X}]}=\frac{E[w(\theta)\eta(\theta)|T]}{E[w(\theta)|T]}
  2. Under loss function L(δ,θ)=(δ(X~)η(θ))2L(\delta, \theta)=(\delta(\utilde{X})-\eta(\theta))^2 with respect to prior π\pi, the unique Bayes est is δπ(X~)=E[η(θ)X~]=E[η(θ)T]\delta_\pi(\utilde{X})=E[\eta(\theta)|\utilde{X}]=E[\eta(\theta)|T]

Minimax Estimation

Bayes est. 可以幫助我們找到 Minimax est.

Theorem

在給定損失函數 L(δ,θ)L(\delta, \theta) 之下,如果 δπ(X~)\delta_\pi(\utilde{X}) 滿足以下條件

  1. δπ(X~)\delta_\pi(\utilde{X})η(θ)\eta(\theta) 關於先驗分佈 π(θ)\pi(\theta) 的 Bayes est

  2. R(δπ,θ)R(\delta_\pi, \theta) 是常數(與 θ\theta 無關)

δπ(X~)\delta_\pi(\utilde{X})η(θ)\eta(\theta) 的 minmax est.

如果進一步 δπ(X~)\delta_\pi(\utilde{X})唯一的 Bayes est,則 δπ(X~)\delta_\pi(\utilde{X}) 同樣是 η(θ)\eta(\theta)唯一的 minmax est.

Proof: 令 δπ\delta_\piη(θ)\eta(\theta) 的 Bayes est. 使得 R(δπ,θ)R(\delta_\pi, \theta) 是常數。

rπ(δπ)ΩR(δπ,θ)π(θ)dθ=R(δπ,θ)Ωπ(θ)dθR(δπ,θ) is const.=supθΩR(δπ,θ)π(θ) is pdf\begin{align*} r_\pi(\delta_\pi) \triangleq& \int_\Omega R(\delta_\pi, \theta)\pi(\theta)d\theta\\ =& R(\delta_\pi, \theta)\int_\Omega\pi(\theta)d\theta \qquad \because R(\delta_\pi, \theta) \text{ is const.}\\ =& \sup_{\theta\in\Omega}R(\delta_\pi, \theta) \qquad \because \pi(\theta) \text{ is pdf} \end{align*}

δδπ\forall \delta \neq \delta_\pi

rπ(δ)ΩR(δ,θ)π(θ)dθΩsupθΩR(δπ,θ)π(θ)dθ=supθΩR(δπ,θ)Ωπ(θ)dθ=supθΩR(δπ,θ)π(θ) is pdf\begin{align*} r_\pi(\delta) \triangleq& \int_\Omega R(\delta, \theta)\pi(\theta)d\theta\\ \le& \int_\Omega \sup_{\theta\in\Omega}R(\delta_\pi, \theta)\pi(\theta)d\theta\\ =& \sup_{\theta\in\Omega}R(\delta_\pi, \theta)\int_\Omega\pi(\theta)d\theta\\ =& \sup_{\theta\in\Omega}R(\delta_\pi, \theta) \qquad \because \pi(\theta) \text{ is pdf} \end{align*}

因為 δπ\delta_\pi 是 Bayes est:

supθΩR(δπ,θ)=rπ(δπ)(<) if uniquerπ(δ)supθΩR(δπ,θ)\begin{align*} \sup_{\theta\in\Omega}R(\delta_\pi, \theta) = r_\pi(\delta_\pi) \stackrel{(<) \text{ if unique}}{\le} r_\pi(\delta) \le \sup_{\theta\in\Omega}R(\delta_\pi, \theta) \end{align*}

因為 δ\delta 是任意的,i.e. supθΩR(δπ,θ)infδsupθΩR(δπ,θ)    δπ(X~)\sup_{\theta\in\Omega}R(\delta_\pi, \theta) \le \inf_\delta\sup_{\theta\in\Omega}R(\delta_\pi, \theta) \implies \delta_\pi(\utilde{X})η(θ)\eta(\theta) 的 minmax est.

Remark

因為 minimax est 不在意 θ\theta 的 prior dist,所以如果我們要從 Bayes est 得到 η\eta 的 minimax est,只需要找到一個先驗分佈,使得風險函數是常數。

Remark

ffgg 都是 pdf,且成正比,则 f=gf=g

EX: X1,,XniidB(1,θ)X_1, \cdots, X_n \stackrel{\text{iid}}{\sim} B(1,\theta) with θBeta(α,β)\theta\sim Beta(\alpha, \beta)

i.e. θΩ=(0,1)\theta\in\Omega=(0,1) and

π(θ)=Γ(α+β)Γ(α)Γ(β)θα1(1θ)β1θα1(1θ)β1\begin{align*} \pi(\theta)&=\frac{\Gamma(\alpha+\beta)}{\Gamma(\alpha)\Gamma(\beta)}\theta^{\alpha-1}(1-\theta)^{\beta-1}\\ &\propto \theta^{\alpha-1}(1-\theta)^{\beta-1} \end{align*}

e.g. α=1=β,    π(θ)=1,θ(0,1)\alpha=1=\beta,\implies \pi(\theta)=1, \theta\in(0,1) i.e. θU(0,1)\theta\sim U(0,1) 也就是說用戶沒有意見。

    π(θx~)=f(x~;θ)π(θ)Ωf(x~;θ)π(θ)dθf(x~;θ)π(θ)分母的 θ 被積分掉,且x~ 給定θt+α1(1θ)nt+β1with t=xipdf of Beta(t+α,nt+β)    θx~Beta(t+α,nt+β)\begin{align*} &\begin{align*} \implies \pi(\theta|\utilde{x}) &= \frac{f(\utilde{x};\theta)\pi(\theta)}{\int_\Omega f(\utilde{x};\theta)\pi(\theta)d\theta}\\ &\propto f(\utilde{x};\theta)\pi(\theta) \quad \because\text{分母的 } \theta \text{ 被積分掉,且} \utilde{x} \text{ 給定} \\ & \propto \theta^{t+\alpha-1}(1-\theta)^{n-t+\beta-1} \qquad \text{with } t=\sum x_i \\ & \propto \text{pdf of } Beta(t+\alpha, n-t+\beta) \end{align*}\\ &\implies \theta|_{\utilde{x}} \sim Beta(t+\alpha, n-t+\beta) \end{align*}
  1. 在損失函數 L(δ,θ)=[δ(X~)θ]2L(\delta, \theta)=[\delta(\utilde{X})-\theta]^2 的情況下,我們可以得到 δπ(X~)=E[θX~]=T+αn+α+β\delta_\pi(\utilde{X})=E[\theta|\utilde{X}]=\frac{T+\alpha}{n+\alpha+\beta}

    δπ(X)=T+αn+α+β=α+βn+αβαα+β+nn+α+βTn=λαα+β+(1λ)Xˉ\delta_\pi(X)=\frac{T+\alpha}{n+\alpha+\beta} = \frac{\alpha+\beta}{n+\alpha_\beta}\frac{\alpha}{\alpha+\beta}+\frac{n}{n+\alpha+\beta}\frac{T}{n}=\lambda\frac{\alpha}{\alpha+\beta}+(1-\lambda)\bar{X}

    我們希望找到 minimax est,為此我們需要找到一組適合的 (α,β)(\alpha, \beta) 使得風險函數是常數。

    R(δπ,θ)=Eθ[(δπ(X~)θ)2]=Varθ(δπ(X~))+Bias2(δπ(X~))MSE 的分解=Varθ(T+αn+α+β)+[Eθ(T+αn+α+β)θ]2=nθ(1θ)n+α+β+[nθ(1θ)n+α+βθ]2where T=i=1nXiB(n,θ)=1(n+α+β)2[((α+β)2n)θ+(n2α(α+β))θ+α]\begin{align*} R(\delta_\pi, \theta) &= E_\theta[(\delta_\pi(\utilde{X})-\theta)^2]\\ &= Var_\theta(\delta_\pi(\utilde{X}))+Bias^2(\delta_\pi(\utilde{X})) \quad \text{MSE 的分解}\\ &=Var_\theta(\frac{T+\alpha}{n+\alpha+\beta})+[E_\theta(\frac{T+\alpha}{n+\alpha+\beta})-\theta]^2\\ &=\frac{n\theta(1-\theta)}{n+\alpha+\beta}+\left[\frac{n\theta(1-\theta)}{n+\alpha+\beta}-\theta\right]^2 \quad \text{where } T=\sum_{i=1}^n X_i\sim B(n, \theta)\\ &=\frac{1}{(n+\alpha+\beta)^2}[((\alpha+\beta)^2-n)\theta+(n-2\alpha(\alpha+\beta))\theta+\alpha]\\ \end{align*}

    為了使 θ\theta 的係數為 0:

    {(α+β)2n=0n2α(α+β)=0\left\{ \begin{align*} (\alpha+\beta)^2-n &=0\\ n-2\alpha(\alpha+\beta) &=0 \end{align*} \right.

        α=β=n2,δπ(X~)=n2+Tn+n\implies \alpha=\beta=\frac{\sqrt n}{2}, \delta_\pi(\utilde{X})=\frac{\frac{\sqrt n}{2}+T}{n+\sqrt{n}} 是唯一的 minmax est.

    這與我們傳統上用 Xˉ\bar{X} 估計 θ\theta 的結果不同。

  2. 令損失函數為 L(δ,θ)=[δ(X~)θ]2θ(1θ)L(\delta, \theta)=\frac{[\delta(\utilde{X})-\theta]^2}{\theta(1-\theta)} ,i.e. w(θ)=1θ(1θ)=1Var(T)w(\theta)=\frac{1}{\theta(1-\theta)}=\frac{1}{Var(T)}。這樣可以將單位統一(unit-free),也就是將方差都變成 1。

    δπ(X~)=E[11θX~]E[1θ(1θ)X~]\delta_\pi(\utilde{X})=\frac{E[\frac{1}{1-\theta}|\utilde{X}]}{E[\frac{1}{\theta(1-\theta)}|\utilde{X}]}

    因為之前已經知道 θx~B(α+t,nt+β)\theta|_{\utilde{x}} \sim B(\alpha+t, n-t+\beta),因此可以算出兩個期望值

        δπ(X~)=α+T1n+α+β2=U(0,1)α=β=1Tn=Xˉ\implies \delta_\pi(\utilde{X})=\frac{\alpha+T-1}{n+\alpha+\beta-2}\xlongequal[U(0,1)]{\alpha=\beta=1} \frac{T}{n}= \bar{X}

    也就是說,當 θ\theta 的先驗分佈是 U(0,1)U(0,1) 時,Xˉ\bar{X}θ\theta 唯一的 Bayes est.

    並且它的風險函數

    R(Xˉ,θ)=Eθ[(Xˉθ)2θ(1θ)]=Var(Xˉ)θ(1θ)=1n is const.R(\bar{X}, \theta)=E_\theta[\frac{(\bar{X}-\theta)^2}{\theta(1-\theta)}]=\frac{Var(\bar{X})}{\theta(1-\theta)}=\frac{1}{n} \quad \text{ is const.}

    因此 Xˉ\bar{X}θ\theta 的 minmax est.

Remark

一個估計方法 δ\delta 如果在任意一個 prior 下是 Bayes est,且風險函數是常數,那麽 δ\delta 就是 minmax est.

但假如 δ\delta 不可能會是 Bayes est,就無法用這個方法找到 minmax est.

我們需要通過逼近 Bayes 的方法來找到 minmax est.

Theorem

在給定損失函數之下,如果 δ0\delta_0 使得 supθR(δ0θ)=r\sup_\theta R(\delta_0\theta)=r

假設 (πm)(\pi_m) 是一系列的先驗分佈,它們對應的 Bayes est (δπm)(\delta_{\pi_m}) 有 Bayes rist (rπm(δπm))(r_{\pi_m}(\delta_{\pi_m})),且 limmrπm(δπm)=r\lim_{m\to\infty}r_{\pi_m}(\delta_{\pi_m})=r

δ0\delta_0η(θ)\eta(\theta) 的 minmax est.

Proof: 對於任何 δ\delta,我們有

supθR(δ,θ)ΩR(δ,θ)πm(θ)dθ=rπm(δ)δpi is Bayes est.rπm(δπm)=rmmr=supθR(δ0,θ)\begin{align*} \sup_\theta R(\delta, \theta) \ge \int_\Omega R(\delta, \theta)\pi_m(\theta)d\theta &= r_{\pi_m}(\delta) \\ \because \delta_pi \text{ is Bayes est.} \quad &\ge r_{\pi_m}(\delta_{\pi_m}) = r_m \xrightarrow[m\to\infty]{} r = \sup_\theta R(\delta_0, \theta) \end{align*}

i.e. δ,supθR(δ,θ)supθR(δ0,θ)\forall \delta, \sup_\theta R(\delta, \theta) \ge \sup_\theta R(\delta_0, \theta), 因此 δ0\delta_0 是 minmax est.


EX: X1,,XniidP(θ)X_1, \cdots, X_n \overset{\text{iid}}{\sim} P(\theta) with loss function L(δ,θ)=1θ(δ(X~)θ)2L(\delta, \theta)=\frac{1}{\theta}(\delta(\utilde{X})-\theta)^2. Proof that δ(X~)=Xˉ\delta(\utilde{X})=\bar{X} is minmax est for θ\theta.

考慮 θ\theta 的先驗分佈,θExp(τ)\theta\sim Exp(\tau)

    π(θx~)f(x~;θ)π(θ)θte(n+τ)θwhere t=xi    θx~Gamma(t+1,1n+τ)    δπ(X~)=1E[1θX~]=Tn+ττTn=Xˉ\begin{align*} \implies & \pi(\theta|\utilde{x}) \propto f(\utilde{x};\theta)\pi(\theta) \propto \theta^t e^{-(n+\tau)\theta} \quad \text{where } t=\sum x_i\\ \implies & \theta|\utilde{x} \sim Gamma(t+1, \frac{1}{n+\tau})\\ \implies & \delta_\pi(\utilde{X}) = \frac{1}{E[\frac{1}{\theta}|\utilde{X}]} = \frac{T}{n+\tau} \xrightarrow[\tau\to\infty]{} \frac{T}{n} = \bar{X} \end{align*} R(δπ,θ)=Eθ[(δπ(X~)θ)2θ]=1θEθ[(Tn+τθ)2]where T=XiP(nθ)=1θ[Var(Tn+τθ)+(Eθ(Tn+τθ))2]=1θ(1θ+τ)2nθ+1θ(nθn+τθ)2=n(n+τ)2+τ2θ(n+τ)2\begin{align*} R(\delta_\pi, \theta) &=E_\theta\left[\frac{(\delta_\pi(\utilde{X})-\theta)^2}{\theta}\right]\\ &=\frac{1}{\theta}E_\theta\left[(\frac{T}{n+\tau}-\theta)^2\right] \quad \text{where } T=\sum X_i\sim P(n\theta)\\ &=\frac{1}{\theta}\left[Var(\frac{T}{n+\tau}-\theta) + \left(E_\theta(\frac{T}{n+\tau}-\theta) \right)^2 \right]\\ &=\frac{1}{\theta}(\frac{1}{\theta+\tau})^2n\theta + \frac{1}{\theta}(\frac{n\theta}{n+\tau}-\theta)^2\\ &=\frac{n}{(n+\tau)^2} + \frac{\tau^2\theta}{(n+\tau)^2} \end{align*}     rπ(δπ)=ΩR(δπ,θ)π(θ)dθ=0[n(n+τ)2+τ2θ(n+τ)2]π(θ)dθ=Eθ[n(n+τ)2+τ2θ(n+τ)2]where θExp(τ)=dGamma(1,1τ)=n(n+τ)2+τ2(n+τ)2Eθ(θ)=1n+ττ1n=r\begin{align*} \implies r_\pi(\delta_\pi) &= \int_\Omega R(\delta_\pi, \theta)\pi(\theta)d\theta\\ &=\int_0^\infty \left[\frac{n}{(n+\tau)^2} + \frac{\tau^2\theta}{(n+\tau)^2}\right]\pi(\theta)d\theta\\ &=E_\theta\left[\frac{n}{(n+\tau)^2} + \frac{\tau^2\theta}{(n+\tau)^2}\right] \quad \text{where } \theta\sim Exp(\tau) \overset{d}{=} Gamma(1, \frac{1}{\tau})\\ &=\frac{n}{(n+\tau)^2} + \frac{\tau^2}{(n+\tau)^2}E_\theta(\theta)\\ &=\frac{1}{n+\tau} \xrightarrow[\tau\to\infty]{} \frac{1}{n} = r \end{align*}

    supθR(δπ,θ)π(θ)=r=1n\implies \sup_\theta R(\delta_\pi, \theta)\pi(\theta) = r = \frac{1}{n}


EX: X1,,XniidN(θ,σ02)X_1, \cdots, X_n\overset{\text{iid}}{\sim} N(\theta, \sigma^2_0)

    T=Xˉ\implies T=\bar{X} is suff. for θ\theta, and TN(θ,σ02n)=N(θ,σ2)T\sim N(\theta, \frac{\sigma^2_0}{n}) = N(\theta, \sigma^2) where σ2=σ02n\sigma^2=\frac{\sigma^2_0}{n}

Remark

在損失函數 L(δ,θ)=(δ(X~)θ)2L(\delta, \theta)=(\delta(\utilde{X})-\theta)^2 之下,任何無偏估計都不會是 θ\theta 的 minmax est.

Let prior π\pi s.t. θN(μ,τ)\theta\sim N(\mu, \tau)

    π(θx~)e(tθ)2σ2e(θμ)2τ2\begin{align*} \implies \pi(\theta|\utilde{x}) & \propto e^{-\frac{(t-\theta)^2}{\sigma^2}}e^{-\frac{(\theta-\mu)^2}{\tau^2}}\\ \end{align*}