Bayesian Estimation & Minimax Estimation
損失函數
之前提到過,我們的目標是針對 η ( θ ) \eta(\theta) η ( θ ) 找到一個好的點估計。而我們用 M S E ( δ , θ ) ≜ E θ [ ( δ ( X ~ ) − η ( θ ) ) 2 ] MSE(\delta, \theta)\triangleq E_\theta[(\delta(\utilde{X})-\eta(\theta))^2] MSE ( δ , θ ) ≜ E θ [( δ ( X ) − η ( θ ) ) 2 ] 來衡量一個點估計的好壞。但對於不同的情景和需求,我們可能會需要不同的損失函數,所以我們需要一個更一般的定義。
Define a loss function L ( δ , θ ) L(\delta,\theta) L ( δ , θ ) for η ( θ ) \eta(\theta) η ( θ ) $ with
L ( δ ( X ~ , θ ) ) ≥ 0 , ∀ x ~ , θ L(\delta(\utilde{X},\theta))\ge 0, \forall \utilde{x}, \theta L ( δ ( X , θ )) ≥ 0 , ∀ x , θ
L ( η ( θ ) , θ ) = 0 , ∀ θ L(\eta(\theta), \theta)=0, \forall\theta L ( η ( θ ) , θ ) = 0 , ∀ θ
E.g.
L ( δ , θ ) = ( δ − θ ) 2 L(\delta, \theta)=(\delta-\theta)^2 L ( δ , θ ) = ( δ − θ ) 2
L ( δ , θ ) = ∣ δ − θ ∣ L(\delta, \theta)=|\delta-\theta| L ( δ , θ ) = ∣ δ − θ ∣
L ( δ , θ ) = w ( θ ) ∣ δ − η ( θ ) ∣ k , ∀ k > 0 , w ( θ ) > 0 L(\delta, \theta)=w(\theta)|\delta-\eta(\theta)|^k, \forall k>0, w(\theta)>0 L ( δ , θ ) = w ( θ ) ∣ δ − η ( θ ) ∣ k , ∀ k > 0 , w ( θ ) > 0 , 其中 w ( θ ) w(\theta) w ( θ ) 是一個權重函數(weight function)
一個估計方法得到的損失函數的值越小,代表這個估計方法越好。但 L ( δ ( X ~ ) , θ ) L(\delta(\utilde{X}),\theta) L ( δ ( X ) , θ ) 沒有辦法比較,因為我們每次獲得的數據都不一樣,因此我們就計算損失函數的期望值,來獲得這個估計方法平均而言的表現。
風險函數
The risk function of δ \delta δ is defined as
R ( δ , θ ) ≜ E θ [ L ( δ ( X ~ ) , θ ) ] R(\delta, \theta)\triangleq E_\theta[L(\delta(\utilde{X}), \theta)] R ( δ , θ ) ≜ E θ [ L ( δ ( X ) , θ )]
因此好的估計方法,可以得到較小的風險函數。但與 M S E MSE MSE 一樣,我們同樣無法得到一個可以讓 R ( δ , θ ) R(\delta, \theta) R ( δ , θ ) 最小的估計方法 。
在比較不同風險函數時,對於不同的 θ \theta θ ,更好的風險函數可能是不同的,所以我們希望用一個數字來概括 R ( δ , θ ) R(\delta, \theta) R ( δ , θ ) ,而數字最小的那個就是最好的估計方法。
Q: 如何概括 R ( δ , θ ) R(\delta, \theta) R ( δ , θ ) ?
我們比較所有風險函數的最大值,取最小的那個作為 δ ∗ \delta^* δ ∗
i.e. R ( δ ∗ , θ ) = inf δ [ sup θ ∈ Ω R ( δ , θ ) ] R(\delta^*, \theta)=\inf_\delta[\sup_{\theta\in\Omega}R(\delta, \theta)] R ( δ ∗ , θ ) = inf δ [ sup θ ∈ Ω R ( δ , θ )]
我們稱這個估計方法為 minmax est.
在一些情景中,θ \theta θ 在某些區域內發生的幾率更高,而在其他地方不怎麼發生。此時我們需要的估計方式是在這些區域內表現得更好。因此我們需要對這些區間進行加權。這就是 Bayesian Estimation 的核心思想。
Bayes estimator
先驗分佈
Bayes estimator :
π ( θ ) \pi(\theta) π ( θ ) : Ω \Omega Ω 上的先驗分佈(prior distribution),是一個 pdf。
δ π ( X ~ ) \delta_\pi(\utilde{X}) δ π ( X ) is called the Bayes estimator of η ( θ ) \eta(\theta) η ( θ ) iff r π ( δ π ) ≤ r π ( δ ) , ∀ δ r_\pi(\delta_\pi)\le r_\pi(\delta), \forall \delta r π ( δ π ) ≤ r π ( δ ) , ∀ δ , where
r π ( δ ) ≜ ∫ Ω R ( δ , θ ) π ( θ ) d θ r_\pi(\delta)\triangleq \int_\Omega R(\delta, \theta)\pi(\theta)d\theta r π ( δ ) ≜ ∫ Ω R ( δ , θ ) π ( θ ) d θ is called the Bayes risk of δ \delta δ .
δ 0 ( X ~ ) \delta_0(\utilde{X}) δ 0 ( X ) is admissible
⟺ ∄ δ \iff \not \exist \delta ⟺ ∃ δ s.t. δ \delta δ dom δ 0 \delta_0 δ 0 , i.e. R ( δ , θ ) ≤ R ( δ θ , θ ) , ∀ θ R(\delta, \theta)\le R(\delta_\theta, \theta), \forall \theta R ( δ , θ ) ≤ R ( δ θ , θ ) , ∀ θ and
δ \delta δ 不比 δ \delta δ 糟糕,在某些方面還比 δ 0 \delta_0 δ 0 好
R ( δ , θ ) ≤ R ( δ θ , θ ) ∀ θ R ( δ , θ ) < R ( δ θ , θ ) some δ \begin{align*}
&R(\delta, \theta) \le R(\delta_\theta, \theta) \quad \forall \theta \\
&R(\delta, \theta) < R(\delta_\theta, \theta) \quad \text{some } \delta
\end{align*} R ( δ , θ ) ≤ R ( δ θ , θ ) ∀ θ R ( δ , θ ) < R ( δ θ , θ ) some δ
Any unique Bayes est is adm
Proof :let δ π \delta_\pi δ π be the unique Bayes est with respect to prior π \pi π
Suppose that δ π \delta_\pi δ π is Not adm, i.e. ∃ δ \exist\delta ∃ δ s.t.
R ( δ , θ ) ≤ R ( δ θ , θ ) ∀ θ R ( δ , θ ) < R ( δ θ , θ ) some δ \begin{align*}
&R(\delta, \theta) \le R(\delta_\theta, \theta) \quad \forall \theta \\
&R(\delta, \theta) < R(\delta_\theta, \theta) \quad \text{some } \delta
\end{align*} R ( δ , θ ) ≤ R ( δ θ , θ ) ∀ θ R ( δ , θ ) < R ( δ θ , θ ) some δ
∵ δ π \because \delta_\pi ∵ δ π is Bayes est ⟹ ∫ Ω R ( δ π , θ ) π ( θ ) d θ ≤ ∫ Ω R ( δ , θ ) π ( θ ) d θ ⟺ r π ( δ π ) ≤ r π ( δ ) \implies \int_\Omega R(\delta_\pi, \theta)\pi(\theta)d\theta \le \int_\Omega R(\delta, \theta)\pi(\theta)d\theta \iff r_\pi(\delta_\pi)\le r_\pi(\delta) ⟹ ∫ Ω R ( δ π , θ ) π ( θ ) d θ ≤ ∫ Ω R ( δ , θ ) π ( θ ) d θ ⟺ r π ( δ π ) ≤ r π ( δ )
∵ δ \because \delta ∵ δ dom δ π ⟹ \delta_\pi \implies δ π ⟹ r π ( δ ) = r π ( δ π ) r_\pi(\delta)=r_\pi(\delta_\pi) r π ( δ ) = r π ( δ π ) , i.e. δ \delta δ is also Bayes est and δ ≠ δ π \delta \neq \delta_\pi δ = δ π
But δ π \delta\pi δ π is unique, contradiction.
EX : Let π ( θ ) = P ( θ = θ c ) = 1 \pi(\theta)=P(\theta=\theta_c)=1 π ( θ ) = P ( θ = θ c ) = 1 , with θ c ∈ Ω \theta_c\in\Omega θ c ∈ Ω given, i.e. 認爲 θ \theta θ 一定會是 θ c \theta_c θ c
⟹ \implies ⟹ Bayes risk r π ( δ ) ≜ ∫ Ω R ( δ , θ ) π ( θ ) d θ = R ( δ , θ c ) r_\pi(\delta)\triangleq\int_\Omega R(\delta, \theta)\pi(\theta)d\theta=R(\delta, \theta_c) r π ( δ ) ≜ ∫ Ω R ( δ , θ ) π ( θ ) d θ = R ( δ , θ c )
⟹ \implies ⟹ Bayes est w.r.t. π \pi π is to min R ( δ , θ c ) ⟺ min E θ c [ L ( δ ( X ~ ) , θ c ) ] \min R(\delta, \theta_c)\iff \min E_{\theta_c}[L(\delta(\utilde{X}), \theta_c)] min R ( δ , θ c ) ⟺ min E θ c [ L ( δ ( X ) , θ c )]
⟹ δ π ( X ~ ) = η ( θ c ) \implies \delta_\pi(\utilde{X})=\eta(\theta_c) ⟹ δ π ( X ) = η ( θ c ) is admissible 。既然認爲 θ \theta θ 一定會是 θ c \theta_c θ c ,那麽就用 θ c \theta_c θ c 作爲 θ \theta θ 的估計。
Now let L ( δ , θ ) = w ( θ ) ( δ ( X ~ ) − η ( θ ) ) 2 L(\delta, \theta)=w(\theta)(\delta(\utilde{X})-\eta(\theta))^2 L ( δ , θ ) = w ( θ ) ( δ ( X ) − η ( θ ) ) 2 and given a prior dist (pdf) π ( θ ) \pi(\theta) π ( θ )
Q: 如何計算 η ( θ ) \eta(\theta) η ( θ ) 關於 π ( θ ) \pi(\theta) π ( θ ) 的 Bayes est?
i.e. to min the r π ( δ ) r_\pi(\delta) r π ( δ )
⟹ r π ( δ ) = ∫ Ω R ( δ , θ ) π ( θ ) d θ = ∫ Ω E θ [ L ( δ ( x ~ ) , θ ) ] π ( θ ) d θ = ∫ Ω ∫ R n L ( δ ( x ~ ) , θ ) f ( x ~ ; θ ) d x ~ ⋅ π ( θ ) d θ = Fubini ∫ R n ∫ Ω w ( θ ) [ δ ( x ~ ) − η ( θ ) ] 2 f ( x ~ ; θ ) π ( θ ) d θ d x ~ \begin{align*}
\implies r_\pi(\delta) =& \int_\Omega R(\delta, \theta)\pi(\theta) d\theta\\
=&\int_\Omega E_\theta[L(\delta(\utilde{x}), \theta)]\pi(\theta)d\theta\\
=&\int_\Omega\int_{\R^n} L(\delta(\utilde{x}), \theta)f(\utilde{x};\theta)d\utilde{x}\cdot\pi(\theta)d\theta\\
\xlongequal{\text{Fubini}}&\int_{\R^n}\int_\Omega w(\theta)[\delta(\utilde{x})-\eta(\theta)]^2f(\utilde{x};\theta)\pi(\theta)d\theta d\utilde{x}
\end{align*} ⟹ r π ( δ ) = = = Fubini ∫ Ω R ( δ , θ ) π ( θ ) d θ ∫ Ω E θ [ L ( δ ( x ) , θ )] π ( θ ) d θ ∫ Ω ∫ R n L ( δ ( x ) , θ ) f ( x ; θ ) d x ⋅ π ( θ ) d θ ∫ R n ∫ Ω w ( θ ) [ δ ( x ) − η ( θ ) ] 2 f ( x ; θ ) π ( θ ) d θ d x
為了找到 r π ( δ ) r_\pi(\delta) r π ( δ ) 的最小值,我們只需要找到裡面積分的最小值,因此要找到他微分是 0 的點。
d d θ ∫ Ω w ( θ ) [ δ ( x ~ ) − η ( θ ) ] 2 f ( x ~ ; θ ) π ( θ ) d θ = 0 \frac{d}{d\theta} \int_\Omega w(\theta)[\delta(\utilde{x})-\eta(\theta)]^2f(\utilde{x};\theta)\pi(\theta)d\theta = 0 d θ d ∫ Ω w ( θ ) [ δ ( x ) − η ( θ ) ] 2 f ( x ; θ ) π ( θ ) d θ = 0
2 ∫ Ω w ( θ ) [ δ ( x ~ ) − η ( θ ) ] f ( x ~ ; θ ) π ( θ ) d θ = 0 2\int_\Omega w(\theta)[\delta(\utilde{x})-\eta(\theta)]f(\utilde{x};\theta)\pi(\theta)d\theta = 0 2 ∫ Ω w ( θ ) [ δ ( x ) − η ( θ )] f ( x ; θ ) π ( θ ) d θ = 0
∫ Ω w ( θ ) δ ( x ~ ) f ( x ~ ; θ ) π ( θ ) d θ = ∫ Ω w ( θ ) η ( θ ) f ( x ~ ; θ ) π ( θ ) d θ \int_\Omega w(\theta)\delta(\utilde{x})f(\utilde{x};\theta)\pi(\theta)d\theta = \int_\Omega w(\theta)\eta(\theta)f(\utilde{x};\theta)\pi(\theta)d\theta ∫ Ω w ( θ ) δ ( x ) f ( x ; θ ) π ( θ ) d θ = ∫ Ω w ( θ ) η ( θ ) f ( x ; θ ) π ( θ ) d θ
δ = ∫ Ω w ( θ ) η ( θ ) f ( x ~ ; θ ) π ( θ ) d θ ∫ Ω w ( θ ) f ( x ~ ; θ ) π ( θ ) d θ \delta = \frac{\int_\Omega w(\theta)\eta(\theta)f(\utilde{x};\theta)\pi(\theta)d\theta}{\int_\Omega w(\theta)f(\utilde{x};\theta)\pi(\theta)d\theta} δ = ∫ Ω w ( θ ) f ( x ; θ ) π ( θ ) d θ ∫ Ω w ( θ ) η ( θ ) f ( x ; θ ) π ( θ ) d θ
因此,η ( θ ) \eta(\theta) η ( θ ) 的 Bayes est 是:
δ π ( X ~ ) = ∫ Ω w ( θ ) η ( θ ) f ( X ~ ; θ ) π ( θ ) d θ ∫ Ω w ( θ ) f ( X ~ ; θ ) π ( θ ) d θ \delta_\pi(\utilde{X}) = \frac{\int_\Omega w(\theta)\eta(\theta)f(\utilde{X};\theta)\pi(\theta)d\theta}{\int_\Omega w(\theta)f(\utilde{X};\theta)\pi(\theta)d\theta} δ π ( X ) = ∫ Ω w ( θ ) f ( X ; θ ) π ( θ ) d θ ∫ Ω w ( θ ) η ( θ ) f ( X ; θ ) π ( θ ) d θ
後驗分佈
之前說 X ~ ∼ f ( x ~ ; θ ) \utilde{X}\sim f(\utilde{x};\theta) X ∼ f ( x ; θ ) 實際上是在給定 θ \theta θ 的情況下 X X X 的 pdf。但我們現在認為 θ \theta θ 同樣也是一個隨機變量,所以準確的寫法是 X ~ ∣ θ ∼ f ( x ~ ; θ ) \utilde{X}|_\theta\sim f(\utilde{x};\theta) X ∣ θ ∼ f ( x ; θ ) 。
如果 π ( θ ) \pi(\theta) π ( θ ) 是 θ \theta θ 在 Ω \Omega Ω 上的一個先驗分佈,就是在看到數據之前,對 θ \theta θ 做主觀的猜測,那麽 f ( x ~ ; θ ) π ( θ ) f(\utilde{x};\theta)\pi(\theta) f ( x ; θ ) π ( θ ) 就是 X ~ \utilde{X} X 和 θ \theta θ 的聯合分佈。
而 X ~ \utilde{X} X 的邊際密度函數就是:
m π ( x ~ ) ≜ ∫ Ω f ( x ~ ; θ ) π ( θ ) d θ m_\pi(\utilde{x})\triangleq\int_\Omega f(\utilde{x};\theta)\pi(\theta)d\theta m π ( x ) ≜ ∫ Ω f ( x ; θ ) π ( θ ) d θ
根據這個,我們就可以算出給定數據 X ~ = x ~ \utilde{X}=\utilde{x} X = x 是,θ \theta θ 的條件分佈:
π ( θ ∣ x ~ ) = f ( x ~ ; θ ) π ( θ ) m π ( x ~ ) \pi(\theta|\utilde{x}) = \frac{f(\utilde{x};\theta)\pi(\theta)}{m_\pi(\utilde{x})} π ( θ ∣ x ) = m π ( x ) f ( x ; θ ) π ( θ )
這是我們得到數據後,對 θ \theta θ 推測的改進,就是後驗 (posterior)分佈。
X 1 , ⋯ , X n ∼ iid f ( x ~ ; θ ) X_1,\cdots, X_n \stackrel{\text{iid}}{\sim} f(\utilde{x};\theta) X 1 , ⋯ , X n ∼ iid f ( x ; θ ) , and π ( θ ) \pi(\theta) π ( θ ) is the prior dist of θ \theta θ
The posterior dist of θ \theta θ given X ~ = x ~ \utilde{X}=\utilde{x} X = x is
π ( θ ∣ x ~ ) = f ( x ~ ; θ ) π ( θ ) m π ( x ~ ) \pi(\theta|\utilde{x}) = \frac{f(\utilde{x};\theta)\pi(\theta)}{m_\pi(\utilde{x})} π ( θ ∣ x ) = m π ( x ) f ( x ; θ ) π ( θ ) where m π ( x ~ ) ≜ ∫ Ω f ( x ~ ; θ ) π ( θ ) d θ m_\pi(\utilde{x})\triangleq\int_\Omega f(\utilde{x};\theta)\pi(\theta)d\theta m π ( x ) ≜ ∫ Ω f ( x ; θ ) π ( θ ) d θ is the marginal dist of x ~ \utilde{x} x
Note : Suppose that T = T ( X ~ ) T=T(\utilde{X}) T = T ( X ) 是 θ \theta θ 的充分統計量,那麼根據分割定理,我們可以得到 f ( x ~ ; θ ) = g ( t ; θ ) h ( x ~ ) f(\utilde{x};\theta)=g(t;\theta)h(\utilde{x}) f ( x ; θ ) = g ( t ; θ ) h ( x ) ,這裡我們可以通過調整係數的方式,讓 g ( t ; θ ) g(t;\theta) g ( t ; θ ) 是一個 pdf。則我們有:
π ( θ ∣ x ~ ) = f ( x ~ ; θ ) π ( θ ) ∫ Ω f ( x ~ ; θ ) π ( θ ) d θ = h ( x ~ ) g ( t ; θ ) π ( θ ) h ( x ~ ) ∫ Ω g ( t ; θ ) π ( θ ) d θ = g ( t ; θ ) π ( θ ) ∫ Ω g ( t ; θ ) π ( θ ) d θ = π ( θ ∣ t ) \pi(\theta|\utilde{x}) = \frac{f(\utilde{x};\theta)\pi(\theta)}{\int_\Omega f(\utilde{x};\theta)\pi(\theta)d\theta}=\frac{h(\utilde{x})g(t;\theta)\pi(\theta)}{h(\utilde{x})\int_\Omega g(t;\theta)\pi(\theta)d\theta}=\frac{g(t;\theta)\pi(\theta)}{\int_\Omega g(t;\theta)\pi(\theta)d\theta}=\pi(\theta|t) π ( θ ∣ x ) = ∫ Ω f ( x ; θ ) π ( θ ) d θ f ( x ; θ ) π ( θ ) = h ( x ) ∫ Ω g ( t ; θ ) π ( θ ) d θ h ( x ) g ( t ; θ ) π ( θ ) = ∫ Ω g ( t ; θ ) π ( θ ) d θ g ( t ; θ ) π ( θ ) = π ( θ ∣ t )
在 loss function 是 L ( δ , θ ) = w ( θ ) ( δ ( X ~ ) − η ( θ ) ) 2 L(\delta, \theta)=w(\theta)(\delta(\utilde{X})-\eta(\theta))^2 L ( δ , θ ) = w ( θ ) ( δ ( X ) − η ( θ ) ) 2 的情況下,並且我們已經拿到數據 X ~ = x ~ \utilde{X}=\utilde{x} X = x ,我們可以得到:
δ π ( x ~ ) = ∫ Ω w ( θ ) η ( θ ) f ( x ~ ; θ ) π ( θ ) d θ ∫ Ω w ( θ ) f ( x ~ ; θ ) π ( θ ) d θ = ∫ Ω w ( θ ) η ( θ ) f ( x ~ ; θ ) π ( θ ) m π ( x ~ ) d θ ∫ Ω w ( θ ) f ( x ~ ; θ ) π ( θ ) m π ( x ~ ) d θ = ∫ Ω w ( θ ) η ( θ ) π ( θ ∣ x ~ ) d x ∫ Ω w ( θ ) π ( θ ∣ x ~ ) d θ = E [ w ( θ ) η ( θ ) ∣ x ~ ] E [ w ( θ ) ∣ x ~ ] = E [ w ( θ ) η ( θ ) ∣ t ] E [ w ( θ ) ∣ t ] where t = T ( x ~ ) is suff. for θ \begin{align*}
\delta_\pi(\utilde{x})&=\frac{\int_\Omega w(\theta)\eta(\theta)f(\utilde{x};\theta)\pi(\theta)d\theta}{\int_\Omega w(\theta)f(\utilde{x};\theta)\pi(\theta)d\theta}\\
&=\frac{\int_\Omega w(\theta)\eta(\theta)\frac{f(\utilde{x};\theta)\pi(\theta)}{m_\pi(\utilde{x})}d\theta}{\int_\Omega w(\theta)\frac{f(\utilde{x};\theta)\pi(\theta)}{m_\pi(\utilde{x})}d\theta}\\
&=\frac{\int_\Omega w(\theta)\eta(\theta)\pi(\theta|\utilde{x})dx}{\int_\Omega w(\theta)\pi(\theta|\utilde{x})d\theta}=\frac{E[w(\theta)\eta(\theta)|\utilde{x}]}{E[w(\theta)|\utilde{x}]}\\
&=\frac{E[w(\theta)\eta(\theta)|t]}{E[w(\theta)|t]} \qquad \text{where } t=T(\utilde{x}) \text{ is suff. for } \theta
\end{align*} δ π ( x ) = ∫ Ω w ( θ ) f ( x ; θ ) π ( θ ) d θ ∫ Ω w ( θ ) η ( θ ) f ( x ; θ ) π ( θ ) d θ = ∫ Ω w ( θ ) m π ( x ) f ( x ; θ ) π ( θ ) d θ ∫ Ω w ( θ ) η ( θ ) m π ( x ) f ( x ; θ ) π ( θ ) d θ = ∫ Ω w ( θ ) π ( θ ∣ x ) d θ ∫ Ω w ( θ ) η ( θ ) π ( θ ∣ x ) d x = E [ w ( θ ) ∣ x ] E [ w ( θ ) η ( θ ) ∣ x ] = E [ w ( θ ) ∣ t ] E [ w ( θ ) η ( θ ) ∣ t ] where t = T ( x ) is suff. for θ
也就是說:
δ π ( X ~ ) = E [ w ( θ ) η ( θ ) ∣ X ~ ] E [ w ( θ ) ∣ X ~ ] = E [ w ( θ ) η ( θ ) ∣ T ] E [ w ( θ ) ∣ T ] = w ( θ ) = 1 E [ η ( θ ) ∣ T ] \delta_\pi(\utilde{X})=\frac{E[w(\theta)\eta(\theta)|\utilde{X}]}{E[w(\theta)|\utilde{X}]}=\frac{E[w(\theta)\eta(\theta)|T]}{E[w(\theta)|T]}\xlongequal{w(\theta)=1}E[\eta(\theta)|T] δ π ( X ) = E [ w ( θ ) ∣ X ] E [ w ( θ ) η ( θ ) ∣ X ] = E [ w ( θ ) ∣ T ] E [ w ( θ ) η ( θ ) ∣ T ] w ( θ ) = 1 E [ η ( θ ) ∣ T ]
因此在 MSE 作為風險函數時,Bayes est 沒有辦法再通過 Rao-Blackwell Theorem 進行改進 ,因為 Bayes est 已經是充分統計量的函數。
Under loss function L ( δ , θ ) = w ( θ ) ( δ ( X ~ ) − η ( θ ) ) 2 L(\delta, \theta)=w(\theta)(\delta(\utilde{X})-\eta(\theta))^2 L ( δ , θ ) = w ( θ ) ( δ ( X ) − η ( θ ) ) 2 , the unique Bayes est with respect to prior π \pi π is
δ π ( X ~ ) = E [ w ( θ ) η ( θ ) ∣ X ~ ] E [ w ( θ ) ∣ X ~ ] = E [ w ( θ ) η ( θ ) ∣ T ] E [ w ( θ ) ∣ T ] \delta_\pi(\utilde{X})=\frac{E[w(\theta)\eta(\theta)|\utilde{X}]}{E[w(\theta)|\utilde{X}]}=\frac{E[w(\theta)\eta(\theta)|T]}{E[w(\theta)|T]} δ π ( X ) = E [ w ( θ ) ∣ X ] E [ w ( θ ) η ( θ ) ∣ X ] = E [ w ( θ ) ∣ T ] E [ w ( θ ) η ( θ ) ∣ T ]
Under loss function L ( δ , θ ) = ( δ ( X ~ ) − η ( θ ) ) 2 L(\delta, \theta)=(\delta(\utilde{X})-\eta(\theta))^2 L ( δ , θ ) = ( δ ( X ) − η ( θ ) ) 2 with respect to prior π \pi π , the unique Bayes est is
δ π ( X ~ ) = E [ η ( θ ) ∣ X ~ ] = E [ η ( θ ) ∣ T ] \delta_\pi(\utilde{X})=E[\eta(\theta)|\utilde{X}]=E[\eta(\theta)|T] δ π ( X ) = E [ η ( θ ) ∣ X ] = E [ η ( θ ) ∣ T ]
Minimax Estimation
Bayes est. 可以幫助我們找到 Minimax est.
在給定損失函數 L ( δ , θ ) L(\delta, \theta) L ( δ , θ ) 之下,如果 δ π ( X ~ ) \delta_\pi(\utilde{X}) δ π ( X ) 滿足以下條件
δ π ( X ~ ) \delta_\pi(\utilde{X}) δ π ( X ) 是 η ( θ ) \eta(\theta) η ( θ ) 關於先驗分佈 π ( θ ) \pi(\theta) π ( θ ) 的 Bayes est
R ( δ π , θ ) R(\delta_\pi, \theta) R ( δ π , θ ) 是常數(與 θ \theta θ 無關)
則 δ π ( X ~ ) \delta_\pi(\utilde{X}) δ π ( X ) 是 η ( θ ) \eta(\theta) η ( θ ) 的 minmax est.
如果進一步 δ π ( X ~ ) \delta_\pi(\utilde{X}) δ π ( X ) 是唯一 的 Bayes est,則 δ π ( X ~ ) \delta_\pi(\utilde{X}) δ π ( X ) 同樣是 η ( θ ) \eta(\theta) η ( θ ) 的唯一 的 minmax est.
Proof : 令 δ π \delta_\pi δ π 是 η ( θ ) \eta(\theta) η ( θ ) 的 Bayes est. 使得 R ( δ π , θ ) R(\delta_\pi, \theta) R ( δ π , θ ) 是常數。
r π ( δ π ) ≜ ∫ Ω R ( δ π , θ ) π ( θ ) d θ = R ( δ π , θ ) ∫ Ω π ( θ ) d θ ∵ R ( δ π , θ ) is const. = sup θ ∈ Ω R ( δ π , θ ) ∵ π ( θ ) is pdf \begin{align*}
r_\pi(\delta_\pi) \triangleq& \int_\Omega R(\delta_\pi, \theta)\pi(\theta)d\theta\\
=& R(\delta_\pi, \theta)\int_\Omega\pi(\theta)d\theta \qquad \because R(\delta_\pi, \theta) \text{ is const.}\\
=& \sup_{\theta\in\Omega}R(\delta_\pi, \theta) \qquad \because \pi(\theta) \text{ is pdf}
\end{align*} r π ( δ π ) ≜ = = ∫ Ω R ( δ π , θ ) π ( θ ) d θ R ( δ π , θ ) ∫ Ω π ( θ ) d θ ∵ R ( δ π , θ ) is const. θ ∈ Ω sup R ( δ π , θ ) ∵ π ( θ ) is pdf
∀ δ ≠ δ π \forall \delta \neq \delta_\pi ∀ δ = δ π
r π ( δ ) ≜ ∫ Ω R ( δ , θ ) π ( θ ) d θ ≤ ∫ Ω sup θ ∈ Ω R ( δ π , θ ) π ( θ ) d θ = sup θ ∈ Ω R ( δ π , θ ) ∫ Ω π ( θ ) d θ = sup θ ∈ Ω R ( δ π , θ ) ∵ π ( θ ) is pdf \begin{align*}
r_\pi(\delta) \triangleq& \int_\Omega R(\delta, \theta)\pi(\theta)d\theta\\
\le& \int_\Omega \sup_{\theta\in\Omega}R(\delta_\pi, \theta)\pi(\theta)d\theta\\
=& \sup_{\theta\in\Omega}R(\delta_\pi, \theta)\int_\Omega\pi(\theta)d\theta\\
=& \sup_{\theta\in\Omega}R(\delta_\pi, \theta) \qquad \because \pi(\theta) \text{ is pdf}
\end{align*} r π ( δ ) ≜ ≤ = = ∫ Ω R ( δ , θ ) π ( θ ) d θ ∫ Ω θ ∈ Ω sup R ( δ π , θ ) π ( θ ) d θ θ ∈ Ω sup R ( δ π , θ ) ∫ Ω π ( θ ) d θ θ ∈ Ω sup R ( δ π , θ ) ∵ π ( θ ) is pdf
因為 δ π \delta_\pi δ π 是 Bayes est:
sup θ ∈ Ω R ( δ π , θ ) = r π ( δ π ) ≤ ( < ) if unique r π ( δ ) ≤ sup θ ∈ Ω R ( δ π , θ ) \begin{align*}
\sup_{\theta\in\Omega}R(\delta_\pi, \theta) = r_\pi(\delta_\pi) \stackrel{(<) \text{ if unique}}{\le} r_\pi(\delta) \le \sup_{\theta\in\Omega}R(\delta_\pi, \theta)
\end{align*} θ ∈ Ω sup R ( δ π , θ ) = r π ( δ π ) ≤ ( < ) if unique r π ( δ ) ≤ θ ∈ Ω sup R ( δ π , θ )
因為 δ \delta δ 是任意的,i.e. sup θ ∈ Ω R ( δ π , θ ) ≤ inf δ sup θ ∈ Ω R ( δ π , θ ) ⟹ δ π ( X ~ ) \sup_{\theta\in\Omega}R(\delta_\pi, \theta) \le \inf_\delta\sup_{\theta\in\Omega}R(\delta_\pi, \theta) \implies \delta_\pi(\utilde{X}) sup θ ∈ Ω R ( δ π , θ ) ≤ inf δ sup θ ∈ Ω R ( δ π , θ ) ⟹ δ π ( X ) 是 η ( θ ) \eta(\theta) η ( θ ) 的 minmax est.
因為 minimax est 不在意 θ \theta θ 的 prior dist,所以如果我們要從 Bayes est 得到 η \eta η 的 minimax est,只需要找到一個先驗分佈,使得風險函數是常數。
f f f 和 g g g 都是 pdf,且成正比,则 f = g f=g f = g
EX : X 1 , ⋯ , X n ∼ iid B ( 1 , θ ) X_1, \cdots, X_n \stackrel{\text{iid}}{\sim} B(1,\theta) X 1 , ⋯ , X n ∼ iid B ( 1 , θ ) with θ ∼ B e t a ( α , β ) \theta\sim Beta(\alpha, \beta) θ ∼ B e t a ( α , β )
i.e. θ ∈ Ω = ( 0 , 1 ) \theta\in\Omega=(0,1) θ ∈ Ω = ( 0 , 1 ) and
π ( θ ) = Γ ( α + β ) Γ ( α ) Γ ( β ) θ α − 1 ( 1 − θ ) β − 1 ∝ θ α − 1 ( 1 − θ ) β − 1 \begin{align*}
\pi(\theta)&=\frac{\Gamma(\alpha+\beta)}{\Gamma(\alpha)\Gamma(\beta)}\theta^{\alpha-1}(1-\theta)^{\beta-1}\\
&\propto \theta^{\alpha-1}(1-\theta)^{\beta-1}
\end{align*} π ( θ ) = Γ ( α ) Γ ( β ) Γ ( α + β ) θ α − 1 ( 1 − θ ) β − 1 ∝ θ α − 1 ( 1 − θ ) β − 1
e.g. α = 1 = β , ⟹ π ( θ ) = 1 , θ ∈ ( 0 , 1 ) \alpha=1=\beta,\implies \pi(\theta)=1, \theta\in(0,1) α = 1 = β , ⟹ π ( θ ) = 1 , θ ∈ ( 0 , 1 ) i.e. θ ∼ U ( 0 , 1 ) \theta\sim U(0,1) θ ∼ U ( 0 , 1 ) 也就是說用戶沒有意見。
⟹ π ( θ ∣ x ~ ) = f ( x ~ ; θ ) π ( θ )