信賴集合估計(Confidence Sets Estimation)
我們有 n n n 個數據 X ~ = ( X 1 , ⋯ , X n ) ∼ f ( x ~ ; θ ) \utilde{X}=(X_1, \cdots, X_n)\sim f(\utilde{x};\theta) X = ( X 1 , ⋯ , X n ) ∼ f ( x ; θ ) with θ ∈ Ω ⊂ R r , r ≥ 1 \theta\in\Omega\subset\R^r, r\ge 1 θ ∈ Ω ⊂ R r , r ≥ 1 ,並且我們對 η ( θ ) : Ω → R m , m ≤ r \eta(\theta):\Omega\to\R^m,m\le r η ( θ ) : Ω → R m , m ≤ r (通常 m = 1 m=1 m = 1 ) 感興趣。
e.g. N ( μ , σ 2 ) , θ = ( μ , σ 2 ) N(\mu, \sigma^2), \theta=(\mu, \sigma^2) N ( μ , σ 2 ) , θ = ( μ , σ 2 )
⟹ η ( θ ) = μ η ( θ ) = σ η ( θ ) = log ∣ μ ∣ η ( θ ) = σ 2 η ( θ ) = μ σ ⋯ \begin{alignat*}{3}
\implies &\eta(\theta)=\mu&\qquad&\eta(\theta)=\sigma &\qquad& \eta(\theta)=\log|\mu|\\
&\eta(\theta)=\sigma^2&\qquad&\eta(\theta)=\frac{\mu}{\sigma} &\qquad& \cdots
\end{alignat*} ⟹ η ( θ ) = μ η ( θ ) = σ 2 η ( θ ) = σ η ( θ ) = σ μ η ( θ ) = log ∣ μ ∣ ⋯
在數據 X ~ \utilde{X} X (r.v) 下,η ( θ ) \eta(\theta) η ( θ ) 的集合估計是指在 R m ( = η ( Ω ) ) \R^m(=\eta(\Omega)) R m ( = η ( Ω )) 下找到一個子集 C ( X ~ ) C(\utilde{X}) C ( X ) 使得
∀ θ P θ ( η ( θ ) ∈ C ( X ~ ) ) = r ∈ [ 0 , 1 ] \forall \theta \quad P_\theta\left(\eta(\theta)\in C(\utilde{X})\right)=r\in[0,1] ∀ θ P θ ( η ( θ ) ∈ C ( X ) ) = r ∈ [ 0 , 1 ]
而當得到實際數據 X ~ = x ~ \utilde{X}=\utilde{x} X = x 時,我們稱有 r 的信心,未知量 η ( θ ) ∈ C ( x ~ ) \eta(\theta)\in C(\utilde{x}) η ( θ ) ∈ C ( x ) 。因為當數據確定下來時,η ( θ ) \eta(\theta) η ( θ ) 是否在 C ( x ~ ) C(\utilde{x}) C ( x ) 也是確定的,只是我們不知道。
EX X 1 , ⋯ , X n ∼ iid N ( μ , σ 0 2 ) X_1,\cdots, X_n\overset{\text{iid}}{\sim}N(\mu, \sigma^2_0) X 1 , ⋯ , X n ∼ iid N ( μ , σ 0 2 )
⟹ n ( X ˉ − μ ) σ 0 ∼ N ( 0 , 1 ) \implies\frac{\sqrt{n}(\bar{X}-\mu)}{\sigma_0}\sim N(0,1) ⟹ σ 0 n ( X ˉ − μ ) ∼ N ( 0 , 1 )
⟹ ∀ μ ∈ R 1 − α = P μ ( − z α / 2 < n ( X ˉ − μ ) σ 0 < z α / 2 ) = P μ ( X ˉ − z α / 2 σ 0 n < μ < X ˉ + z α / 2 σ 0 n ) = P μ ( μ ∈ ( X ˉ − z α / 2 σ 0 n , X ˉ + z α / 2 σ 0 n ) ) \begin{align*}
\implies \forall \mu\in\R\quad 1-\alpha&=P_\mu\left(-z_{\alpha/2}<\frac{\sqrt{n}(\bar{X}-\mu)}{\sigma_0}<z_{\alpha/2}\right)\\
&=P_\mu\left(\bar{X}-\frac{z_{\alpha/2}\sigma_0}{\sqrt{n}}<\mu<\bar{X}+\frac{z_{\alpha/2}\sigma_0}{\sqrt{n}}\right)
&=P_\mu\left(\mu\in(\bar{X}-\frac{z_{\alpha/2}\sigma_0}{\sqrt{n}}, \bar{X}+\frac{z_{\alpha/2}\sigma_0}{\sqrt{n}})\right)
\end{align*} ⟹ ∀ μ ∈ R 1 − α = P μ ( − z α /2 < σ 0 n ( X ˉ − μ ) < z α /2 ) = P μ ( X ˉ − n z α /2 σ 0 < μ < X ˉ + n z α /2 σ 0 ) = P μ ( μ ∈ ( X ˉ − n z α /2 σ 0 , X ˉ + n z α /2 σ 0 ) )
⟹ P μ ( μ ∈ C ( X ~ ) ) = 1 − α ∀ μ where C ( X ~ ) = [ X ˉ ± z α / 2 σ 0 n ] \implies P_\mu(\mu\in C(\utilde{X}))=1-\alpha\quad \forall\mu\text{ where }C(\utilde{X})=[\bar{X}\pm\frac{z_{\alpha/2}\sigma_0}{\sqrt{n}}] ⟹ P μ ( μ ∈ C ( X )) = 1 − α ∀ μ where C ( X ) = [ X ˉ ± n z α /2 σ 0 ]
X ~ ∼ f ( x ~ ; θ ) \utilde{X}\sim f(\utilde{x};\theta) X ∼ f ( x ; θ ) where θ \theta θ is the true parameter.
P θ ( η ( θ ) ∈ C ( X ~ ) ) P_\theta(\eta(\theta)\in C(\utilde{X})) P θ ( η ( θ ) ∈ C ( X )) is the coverage probability(涵蓋幾率) of C ( X ~ ) C(\utilde{X}) C ( X ) for η ( θ ) \eta(\theta) η ( θ ) .
我們當然會希望 conv. prob. 越大越好,但按照這個想法 C ( X ~ ) = η ( Ω ) C(\utilde{X})=\eta(\Omega) C ( X ) = η ( Ω ) 一定會是最好的.但 η ( Ω ) \eta(\Omega) η ( Ω ) 在實踐中是沒用的,因此我們需要另一種評判標準。我們希望在 conv. prob. 相同的情況下,C ( X ~ ) C(\utilde{X}) C ( X ) 越小越好。
我們計算 C ( X ~ ) C(\utilde{X}) C ( X ) 會覆蓋所有錯誤點的幾率,即
P θ ( η ( θ ∗ ) ∉ C ( X ~ ) ) ∀ θ ∗ ≠ θ P_\theta(\eta(\theta^*)\notin C(\utilde{X})) \quad \forall \theta^*\neq\theta P θ ( η ( θ ∗ ) ∈ / C ( X )) ∀ θ ∗ = θ
這個幾率越小越好,我們稱之為 false cov. prob 。而當 C ( X ~ ) = η ( Ω ) C(\utilde{X})=\eta(\Omega) C ( X ) = η ( Ω ) 時,error prob. = 1.
Remark:
我們會希望 cov. prob. 越大越好,而覆蓋到在意的 ∀ θ ∗ ≠ θ \forall \theta^*\neq\theta ∀ θ ∗ = θ 的幾率越小越好。其中:
對於雙邊區間 C ( X ~ ) = [ L ( X ~ ) , U ( X ~ ) ] C(\utilde{X})=[L(\utilde{X}), U(\utilde{X})] C ( X ) = [ L ( X ) , U ( X )] ,我們在意 ∀ θ ∗ ≠ θ \forall \theta^*\neq\theta ∀ θ ∗ = θ
單邊區間 C ( X ~ ) = [ L ( X ~ ) , ∞ ) C(\utilde{X})=[L(\utilde{X}), \infty) C ( X ) = [ L ( X ) , ∞ ) ,我們在意 ∀ θ ∗ ≠ θ \forall \theta^*\neq\theta ∀ θ ∗ = θ with η ( θ ∗ ) < η ( θ ) \eta(\theta^*)<\eta(\theta) η ( θ ∗ ) < η ( θ )
單邊區間 C ( X ~ ) = ( − ∞ , U ( X ~ ) ] C(\utilde{X})=(-\infty, U(\utilde{X})] C ( X ) = ( − ∞ , U ( X )] ,我們在意 ∀ θ ∗ ≠ θ \forall \theta^*\neq\theta ∀ θ ∗ = θ with η ( θ ∗ ) > η ( θ ) \eta(\theta^*)>\eta(\theta) η ( θ ∗ ) > η ( θ )
因為 cov. prob. 和 false cov. prob. 是互斥的關係。想要 cov. prob. 最大化,那麼我們會取 C ( X ~ ) = η ( Ω ) C(\utilde{X})=\eta(\Omega) C ( X ) = η ( Ω ) 。想要 false cov. prob. 最小化,那麼我們會取 C ( X ~ ) = η ( ∅ ) C(\utilde{X})=\eta(\empty) C ( X ) = η ( ∅ ) 。因此我們需要一個平衡點。
我們首先會希望 cov. prob 至少要大於某個信賴係數 ,然後再盡可能讓 false cov. prob. 最小化。
C ( X ~ ) C(\utilde{X}) C ( X ) is a conf. set for η ( θ ) \eta(\theta) η ( θ )
P θ ( η ( θ ) ∈ C ( X ~ ) ) , ∀ θ ∈ Ω P_\theta(\eta(\theta)\in C(\utilde{X})), \forall\theta\in\Omega P θ ( η ( θ ) ∈ C ( X )) , ∀ θ ∈ Ω is the coverage probability(涵蓋幾率) of C ( X ~ ) C(\utilde{X}) C ( X )
C ( X ~ ) C(\utilde{X}) C ( X ) is called a 1 − α 1-\alpha 1 − α conf. set for η ( θ ) \eta(\theta) η ( θ ) if
inf θ ∈ Ω P θ ( η ( θ ) ∈ C ( X ~ ) ) = conf. coef of C ( X ~ ) = 1 − α , α ∈ [ 0 , 1 ] \begin{align*}
\inf_{\theta\in\Omega}P_\theta(\eta(\theta)\in C(\utilde{X})) &= \text{ conf. coef of }C(\utilde{X})\\
&= 1-\alpha, \quad \alpha\in [0,1]
\end{align*} θ ∈ Ω inf P θ ( η ( θ ) ∈ C ( X )) = conf. coef of C ( X ) = 1 − α , α ∈ [ 0 , 1 ]
A 1 − α 1-\alpha 1 − α conf. set C ∗ ( X ~ ) C^*(\utilde{X}) C ∗ ( X ) is called a uniformly most accurate (UMA) 1 = α 1=\alpha 1 = α for η ( θ ) ⟺ \eta(\theta)\iff η ( θ ) ⟺
inf θ ∈ Ω P θ ( η ( θ ) ∈ C ∗ ( X ~ ) ) = 1 − α \inf_{\theta\in\Omega}P_\theta(\eta(\theta)\in C^*(\utilde{X}))=1-\alpha inf θ ∈ Ω P θ ( η ( θ ) ∈ C ∗ ( X )) = 1 − α
P θ ( η ( θ ∗ ) ∈ C ∗ ( X ~ ) ≤ P θ ( η ( θ ∗ ) ) ∈ C ( X ~ ) ) , ∀ θ ∗ ≠ θ P_\theta(\eta(\theta^*)\in C^*(\utilde{X})\le P_\theta(\eta(\theta^*))\in C(\utilde{X})), \forall\theta^*\neq\theta P θ ( η ( θ ∗ ) ∈ C ∗ ( X ) ≤ P θ ( η ( θ ∗ )) ∈ C ( X )) , ∀ θ ∗ = θ relevant
∀ 1 − α \forall 1-\alpha ∀1 − α conf. set C ( X ~ ) C(\utilde{X}) C ( X ) for η ( θ ) \eta(\theta) η ( θ )
A 1 − α 1-\alpha 1 − α conf. set C ( X ~ ) C(\utilde{X}) C ( X ) for η ( θ ) \eta(\theta) η ( θ ) is unbiased ⟺ 1 − α ≥ \iff 1-\alpha\ge ⟺ 1 − α ≥ relevant false cov. prob.
i.e. 1 − α ≥ P θ ( η ( θ ∗ ) ∈ C ( X ~ ) ) , ∀ θ ∗ ≠ θ 1-\alpha\ge P_\theta(\eta(\theta^*)\in C(\utilde{X})), \forall\theta^*\neq\theta 1 − α ≥ P θ ( η ( θ ∗ ) ∈ C ( X )) , ∀ θ ∗ = θ relevant
A 1 − α 1-\alpha 1 − α conf. set C ∗ ( X ~ ) C^*(\utilde{X}) C ∗ ( X ) for η ( θ ) \eta(\theta) η ( θ ) is UMAU 1 − α 1-\alpha 1 − α conf. set if C ∗ ( X ~ ) C^*(\utilde{X}) C ∗ ( X ) is UMA among unbiased 1 − α 1-\alpha 1 − α conf. set.
EX X 1 , ⋯ , X n ∼ iid N ( μ , σ 0 2 ) X_1, \cdots, X_n\overset{\text{iid}}{\sim}N(\mu, \sigma^2_0) X 1 , ⋯ , X n ∼ iid N ( μ , σ 0 2 ) of interest μ \mu μ
recall: pointest for μ \mu μ : X ˉ \bar{X} X ˉ (UMVUE, MLE, MOME, Minimax)
But P μ ( X ˉ = μ ) = 0 , ∀ μ P_\mu(\bar{X}=\mu)=0, \forall\mu P μ ( X ˉ = μ ) = 0 , ∀ μ ⟹ \implies ⟹ idea: μ ∈ [ X ˉ ± c ] = C ( X ~ ) , c > 0 \mu\in[\bar{X}\pm c]=C(\utilde{X}), c>0 μ ∈ [ X ˉ ± c ] = C ( X ) , c > 0 given, with positive prob. of being correct.
cov. prob of [ X ˉ ± c ] = P μ ( μ ∈ [ X ˉ ± c ] ) = P μ ( μ − c ≤ X ˉ ≤ μ + c ) = P ( n ( μ − c μ ) σ 0 ≤ Z ≤ n ( μ + c μ ) σ 0 ) = Φ ( c n σ 0 ) − Φ ( − c n σ 0 ) = 2 Φ ( c n σ 0 ) − 1 ∀ μ \begin{align*}
\text{cov. prob of }[\bar{X}\pm c]&=P_\mu(\mu\in[\bar{X}\pm c])\\
&=P_\mu(\mu-c\le\bar{X}\le\mu+c)\\
&=P(\frac{\sqrt{n}(\mu-c\mu)}{\sigma_0}\le Z\le\frac{\sqrt{n}(\mu+c\mu)}{\sigma_0})\\
&=\Phi(\frac{c\sqrt{n}}{\sigma_0})-\Phi(-\frac{c\sqrt{n}}{\sigma_0})\\
&=2\Phi(\frac{c\sqrt{n}}{\sigma_0})-1\quad\forall\mu
\end{align*} cov. prob of [ X ˉ ± c ] = P μ ( μ ∈ [ X ˉ ± c ]) = P μ ( μ − c ≤ X ˉ ≤ μ + c ) = P ( σ 0 n ( μ − c μ ) ≤ Z ≤ σ 0 n ( μ + c μ ) ) = Φ ( σ 0 c n ) − Φ ( − σ 0 c n ) = 2Φ ( σ 0 c n ) − 1 ∀ μ
⟹ inf μ ∈ R P μ ( μ ∈ [ X ˉ ± c ] ) = 2 Φ ( c n σ 0 ) − 1 \implies \inf_{\mu\in\R}P_\mu(\mu\in[\bar{X}\pm c])=2\Phi(\frac{c\sqrt{n}}{\sigma_0})-1 ⟹ μ ∈ R inf P μ ( μ ∈ [ X ˉ ± c ]) = 2Φ ( σ 0 c n ) − 1
is the conf. coef. of [ X ˉ ± c ] [\bar{X}\pm c] [ X ˉ ± c ]
e.g. n = 4 , σ 0 = 1 , c = 1 n=4, \sigma_0=1, c=1 n = 4 , σ 0 = 1 , c = 1
⟹ 2 Φ ( 1 ⋅ 2 1 ) − 1 = 2 Φ ( 2 ) − 1 = 2 ⋅ 0.9772 − 1 = 0.9544 \implies 2\Phi(\frac{1\cdot 2}{1})-1=2\Phi(2)-1=2\cdot 0.9772-1=0.9544 ⟹ 2Φ ( 1 1 ⋅ 2 ) − 1 = 2Φ ( 2 ) − 1 = 2 ⋅ 0.9772 − 1 = 0.9544
i.e. [ X ˉ ± 1 ] [\bar{X}\pm 1] [ X ˉ ± 1 ] is a 95.44% conf. set for μ \mu μ
If want to have conf. coef. = 1 − α 1-\alpha 1 − α
∵ 2 Φ ( c n σ 0 ) − 1 = 1 − α ⟹ c = σ 0 n z α / 2 \because 2\Phi(\frac{c\sqrt{n}}{\sigma_0})-1=1-\alpha\implies c=\frac{\sigma_0}{\sqrt{n}}z_{\alpha/2} ∵ 2Φ ( σ 0 c n ) − 1 = 1 − α ⟹ c = n σ 0 z α /2
⟹ [ X ˉ ± σ 0 n z α / 2 ] \implies [\bar{X}\pm\frac{\sigma_0}{\sqrt{n}}z_{\alpha/2}] ⟹ [ X ˉ ± n σ 0 z α /2 ] is a 1 − α 1-\alpha 1 − α conf. set for μ \mu μ
If now, σ 0 2 \sigma^2_0 σ 0 2 is unknown, θ = ( μ , σ 0 2 ) , η ( θ ) = θ \theta=(\mu, \sigma^2_0), \eta(\theta)=\theta θ = ( μ , σ 0 2 ) , η ( θ ) = θ
Conf. coef. of [ X ˉ ± c ] = inf θ ∈ Ω P θ ( μ ∈ [ X ˉ ± c ] ) = inf σ 0 > 0 [ 2 Φ ( c n σ 0 ) − 1 ] = 0 \begin{align*}
\text{Conf. coef. of }[\bar{X}\pm c]&=\inf_{\theta\in\Omega}P_\theta(\mu\in[\bar{X}\pm c])\\
&=\inf_{\sigma_0>0}[2\Phi(\frac{c\sqrt{n}}{\sigma_0})-1]=0
\end{align*}
Conf. coef. of [ X ˉ ± c ] = θ ∈ Ω inf P θ ( μ ∈ [ X ˉ ± c ]) = σ 0 > 0 inf [ 2Φ ( σ 0 c n ) − 1 ] = 0
i.e. [ Good point ± c ] [\text{Good point}\pm c] [ Good point ± c ] 可能並不能得到一個好的結果。
EX X 1 , ⋯ , X n ∼ iid N ( μ , σ 0 2 ) X_1, \cdots, X_n\overset{\text{iid}}{\sim}N(\mu, \sigma^2_0) X 1 , ⋯ , X n ∼ iid N ( μ , σ 0 2 ) . Given C ( X ~ ) = [ X ˉ − σ 0 n Z α , ∞ ] C(\utilde{X})=[\bar{X}-\frac{\sigma_0}{\sqrt{n}}Z_\alpha, \infty] C ( X ) = [ X ˉ − n σ 0 Z α , ∞ ]
P μ ( μ ∈ C ( X ~ ) ) = P μ ( μ ≥ X ˉ − σ 0 n Z α ) = P μ ( n ( X ˉ − μ ) σ 0 ≤ n ( μ + σ 0 n Z α − μ ) σ 0 ) = P ( Z ≤ Z α ) = 1 − α \begin{align*}
P_\mu(\mu\in C(\utilde{X}))&=P_\mu(\mu\ge\bar{X}-\frac{\sigma_0}{\sqrt{n}}Z_\alpha)\\
&=P_\mu(\frac{\sqrt{n}(\bar{X}-\mu)}{\sigma_0}\le \frac{\sqrt{n}(\mu+\frac{\sigma_0}{\sqrt{n}}Z_\alpha-\mu)}{\sigma_0})\\
&=P(Z\le Z_\alpha)\\
&=1-\alpha
\end{align*} P μ ( μ ∈ C ( X )) = P μ ( μ ≥ X ˉ − n σ 0 Z α ) = P μ ( σ 0 n ( X ˉ − μ ) ≤ σ 0 n ( μ + n σ 0 Z α − μ ) ) = P ( Z ≤ Z α ) = 1 − α
⟹ C ( X ~ ) \implies C(\utilde{X}) ⟹ C ( X ) is 1 − α 1-\alpha 1 − α conf. lower limit for μ \mu μ
Is it unbiased?
P μ ( μ ∗ ∈ C ( X ~ ) ) ∀ μ ∗ < μ = P μ ( μ