跳至主要内容

矩陣形式的迴歸模型

當我們有 nn 筆數據,並且有 kk 個自變數時,我們有以下的迴歸模型:

Y1=β0+β1X11+β2X12++βkX1k+εiY2=β0+β1X21+β2X22++βkX2k+ε2Yn=β0+β1Xn1+β2Xn2++βkXnk+εn\begin{align*} Y_1=&\beta_0+\beta_1X_{11}+\beta_2X_{12}+\cdots+\beta_kX_{1k}+\varepsilon_i\\ Y_2=&\beta_0+\beta_1X_{21}+\beta_2X_{22}+\cdots+\beta_kX_{2k}+\varepsilon_2\\ &\vdots\\ Y_n=&\beta_0+\beta_1X_{n1}+\beta_2X_{n2}+\cdots+\beta_kX_{nk}+\varepsilon_n \end{align*}

我們可以將這個模型寫成矩陣形式:

Y~=[1X11X12X1k1X21X22X2k1Xn1Xn2Xnk]Design Matrixβ~+ε~\utilde{Y}=\underbrace{\begin{bmatrix} 1 & X_{11} & X_{12} & \cdots & X_{1k}\\ 1 & X_{21} & X_{22} & \cdots & X_{2k}\\ \vdots & \vdots & \vdots & \ddots & \vdots\\ 1 & X_{n1} & X_{n2} & \cdots & X_{nk} \end{bmatrix}}_{\text{Design Matrix}}\utilde{\beta}+\utilde{\varepsilon}

因為 X~\utilde{X} 是由我們提供,所以我們將與 β~\utilde{\beta} 有關的部分稱為設計矩陣(Design Matrix),記作 DD

    Y~n×1=Dn×pβ~p×1+ε~n×1\implies \utilde{Y}_{n\times 1}=D_{n\times p}\utilde{\beta}_{p\times 1}+\utilde{\varepsilon}_{n\times 1}

矩陣形式下的基礎定義和結論

Definition
Wn×1=[W1W2Wn]: random vectorUI×J=[U11U12U1JU21U22U2JUI1UI2UIJ]: random matrixW_{n\times 1} = \begin{bmatrix} W_1\\ W_2\\ \vdots\\ W_n \end{bmatrix}: \text{ random vector}\quad U_{I\times J} = \begin{bmatrix} U_{11} & U_{12} & \cdots & U_{1J}\\ U_{21} & U_{22} & \cdots & U_{2J}\\ \vdots & \vdots & \ddots & \vdots\\ U_{I1} & U_{I2} & \cdots & U_{IJ} \end{bmatrix}: \text{ random matrix}    E[W]=[E[W1]E[W2]E[Wn]]E[U]=[E[U11]E[U12]E[U1J]E[U21]E[U22]E[U2J]E[UI1]E[UI2]E[UIJ]]\implies E[W] = \begin{bmatrix} E[W_1]\\ E[W_2]\\ \vdots\\ E[W_n] \end{bmatrix}\qquad E[U] = \begin{bmatrix} E[U_{11}] & E[U_{12}] & \cdots & E[U_{1J}]\\ E[U_{21}] & E[U_{22}] & \cdots & E[U_{2J}]\\ \vdots & \vdots & \ddots & \vdots\\ E[U_{I1}] & E[U_{I2}] & \cdots & E[U_{IJ}] \end{bmatrix}    σ2{W}=E[(WE[W])m×1(WE[W])1×mt]=[Var[W1]Cov[W1,W2]Cov[W1,Wn]Cov[W2,W1]Var[W2]Cov[W2,Wn]Cov[Wn,W1]Cov[Wn,W2]Var[Wn]]=[VarCovCovVar]is symmetric= Variance-Covariance Matrix of W\begin{align*} \implies \sigma^2\set{W} &=E\left[\left(W-E[W]\right)_{m\times 1}\left(W-E[W]\right)_{1\times m} ^t\right]\\ &=\begin{bmatrix} Var[W_1] & Cov[W_1,W_2] & \cdots & Cov[W_1,W_n]\\ Cov[W_2,W_1] & Var[W_2] & \cdots & Cov[W_2,W_n]\\ \vdots & \vdots & \ddots & \vdots\\ Cov[W_n,W_1] & Cov[W_n,W_2] & \cdots & Var[W_n] \end{bmatrix}\\ &=\begin{bmatrix} Var & Cov\\ Cov & Var \end{bmatrix} \quad \text{is symmetric}\\ &=\text{ Variance-Covariance Matrix of } W \end{align*}

性質:設 A,B,CA, B, C 是常數向量/矩陣,WW 是隨機向量,UU 是隨機矩陣,則:

  1. E[A] = A
  2. E[AUB+C] = AE[U]B+C
  3. σ2{W}=E(wwt)E[W](E[W])t\sigma^2\set{W}=E(ww^t)-E[W](E[W])^t
  4. σ2{An×wWw×1}n×n=An×wσ2{W}w×wAw×nt\sigma^2\set{A_{n\times w}W_{w\times 1}}_{n\times n} = A_{n\times w}\sigma^2\set{W}_{w\times w}A^t_{w\times n}
  5. σ2{AW+B}=σ2{AW}\sigma^2\set{AW+B}=\sigma^2\set{AW}

Note: 如果 WWm×1m\times 1 隨機向量,則 σ2{W}\sigma^2\set{W}m×mm\times m 的對稱矩陣。並且 a~Rn\forall \utilde{a}\in\R^n 是常數向量

a~tσ2{W}a~=σ2{a~tW}0\utilde{a}^t\sigma^2\set{W}\utilde{a}=\sigma^2\set{\utilde{a}^tW}\ge 0

因此 a~tσ2{W}a~\utilde{a}^t\sigma^2\set{W}\utilde{a} 是半正定矩陣(Positive Semi-Definite)。而等於號僅在 a~=0\utilde{a}=0 時成立。

Definition

f:RkRf:\R^k\to\R,並且 f(θ~)R,θ=(θ1,,θk)tRkf(\utilde{\theta})\in\R, \forall\theta=(\theta_1, \cdots, \theta_k)^t\in\R^k

θ~f(θ~)[fθ1fθ2fθk]\frac{\partial}{\partial\utilde{\theta}} f(\utilde{\theta}) \triangleq \begin{bmatrix} \frac{\partial f}{\partial\theta_1}\\ \frac{\partial f}{\partial\theta_2}\\ \vdots\\ \frac{\partial f}{\partial\theta_k} \end{bmatrix}
lemma 1

Given c~Rk\utilde{c}\in\R^k, f(θ~)=c~tθ~=θ~tc~f(\utilde{\theta})=\utilde{c}^t\utilde{\theta}=\utilde{\theta}^t\utilde{c}, θ~Rk\forall \utilde{\theta}\in\R^k

θ~f(θ~)=c~\frac{\partial}{\partial\utilde{\theta}}f(\utilde{\theta})=\utilde{c}

i.e. θ~(c~tθ~)=θ~(θ~tc~)=c~\frac{\partial}{\partial\utilde{\theta}}(\utilde{c}^t\utilde{\theta})=\frac{\partial}{\partial\utilde{\theta}}(\utilde{\theta}^t\utilde{c})=\utilde{c}

lemma 2

如果 AAk×kk\times k 的對稱常數矩陣,則以下形式的矩陣被稱為二次型(Quadratic Form):

f(θ~)=θ~tAθ~=i,jθiAijθjf(\utilde{\theta})=\utilde{\theta}^tA\utilde{\theta}=\sum_{i,j}\theta_iA_{ij}\theta_j

並且

θ~f(θ~)=2Aθ~\frac{\partial}{\partial\utilde{\theta}}f(\utilde{\theta})=2A\utilde{\theta}

如果 AA 不一定對稱,則 θ~f(θ~)=Aθ~+Atθ~\frac{\partial}{\partial\utilde{\theta}}f(\utilde{\theta})=A\utilde{\theta}+A^t\utilde{\theta}

因此在矩陣形式下的一般線性回歸模型會有:

Y~n×1=Dn×pβ~p×1+ε~n×1 with E[ε~]=0,σ2{ε~}=[σ200σ2]=σ2In×n\utilde{Y}_{n\times 1}=D_{n\times p}\utilde{\beta}_{p\times 1}+\utilde{\varepsilon}_{n\times 1}\text{ with } E[\utilde{\varepsilon}]=0, \sigma^2\set{\utilde{\varepsilon}}=\begin{bmatrix} \sigma^2& \cdots &0\\ \vdots & \ddots & \vdots\\ 0 & \cdots & \sigma^2 \end{bmatrix} =\sigma^2I_{n\times n}     E[Y~]=E[Dβ~+ε~]=Dβ~= regression functionσ2{Y~}=σ2{Dβ~+ε~}=σ2{ε~}=σ2In×n\begin{align*} \implies &E[\utilde{Y}]=E[D\utilde{\beta}+\utilde{\varepsilon}]=D\utilde{\beta}=\text{ regression function}\\ &\sigma^2\set{\utilde{Y}}=\sigma^2\set{D\utilde{\beta}+\utilde{\varepsilon}}=\sigma^2\set{\utilde{\varepsilon}}=\sigma^2I_{n\times n} \end{align*}
Definition
Q(β~)Y~E[Y~]2=Y~Dβ~2=(Y~Dβ~)t(Y~Dβ~)\begin{align*} Q(\utilde{\beta})\triangleq& ||\utilde{Y}-E[\utilde{Y}]||^2\\ &=||\utilde{Y}-D\utilde{\beta}||^2\\ &=(\utilde{Y}-D\utilde{\beta})^t(\utilde{Y}-D\utilde{\beta})\\ \end{align*}

如果 Q(b~)=minβ~RnQ(\utilde{b})=\min_{\utilde{\beta}\in\R^n},則 b~\utilde{b}β~\utilde{\beta} 的 LSE。

注意到

Q(β~)=Y~1×ntY~n×1Y~1×ntDn×pβ~p×1β~1×ptDp×ntY~n×1+β~1×ptDp×ntDn×pβ~p×1=Y~tY~2β~tDtY~+β~tDtDβ~\begin{align*} Q(\utilde{\beta})&=\utilde{Y}^t_{1\times n}\utilde{Y}_{n\times 1}-\utilde{Y}^t_{1\times n}D_{n\times p}\utilde{\beta}_{p\times 1}-\utilde{\beta}^t_{1\times p}D^t_{p\times n}\utilde{Y}_{n\times 1}+\utilde{\beta}^t_{1\times p}D^t_{p\times n}D_{n\times p}\utilde{\beta}_{p\times 1}\\ &=\utilde{Y}^t\utilde{Y}-2\utilde{\beta}^tD^t\utilde{Y}+\utilde{\beta}^tD^tD\utilde{\beta}\\ \end{align*}

p×1p\times 1 的矩阵。因此根据之前两个 Lemma,我们可以得到

β~Q(β~)=2DtY~+2DtDβ~\frac{\partial}{\partial\utilde{\beta}}Q(\utilde{\beta})=-2D^t\utilde{Y}+2D^tD\utilde{\beta}

如果 b~\utilde{b}β~\utilde{\beta} 的 LSE     \implies β~Q(β~)b~=0    2DtY~+2DtDb~=0\frac{\partial}{\partial\utilde{\beta}}Q(\utilde{\beta})|_{\utilde{b}}=0 \iff -2D^t\utilde{Y}+2D^tD\utilde{b}=0

如果 b~\utilde{b} 滿足 DtDb~=DtY~D^tD\utilde{b}=D^t\utilde{Y},那麼對於所有其他的 β~\utilde{\beta}

Q(β~)=Y~Dβ~2=Y~Db~+Db~Dβ~2=Y~Db~2+Db~Dβ~2+2(Y~Db~)t(Db~Dβ~)=(DtY~DtDb~)(b~β~)=0=Y~Db~2+Db~Dβ~20Q(b~)\begin{align*} Q(\utilde{\beta})=||\utilde{Y}-D\utilde{\beta}||^2&=||\utilde{Y}-D\utilde{b}+D\utilde{b}-D\utilde{\beta}||^2\\ &=||\utilde{Y}-D\utilde{b}||^2+||D\utilde{b}-D\utilde{\beta}||^2+2\underbrace{(\utilde{Y}-D\utilde{b})^t(D\utilde{b}-D\utilde{\beta})}_{=(D^t\utilde{Y}-D^tD\utilde{b})(\utilde{b}-\utilde{\beta})=0}\\ &=||\utilde{Y}-D\utilde{b}||^2+\underbrace{||D\utilde{b}-D\utilde{\beta}||^2}_{\ge 0}\\ &\ge Q(\utilde{b}) \end{align*}

i.e. b~\utilde{b}β~\utilde{\beta} 的 LSE。並且等號成立     DtDb~=DtY~\iff D^tD\utilde{b}=D^t\utilde{Y} i.e. β~\utilde{\beta} 滿足 normal equation

因此我們可以得到的結論:b~\utilde{b}β~\utilde{\beta} 的 LSE     DtDb~=DtY~\iff D^tD\utilde{b}=D^t\utilde{Y}

Definition

Normal Equation:

DtDb~=DtY~D^tD\utilde{b}=D^t\utilde{Y}
Theorem
b~ is LSE of β~    DtDb~=DtY~\utilde{b}\text{ is LSE of }\utilde{\beta}\iff D^tD\utilde{b}=D^t\utilde{Y}    Y^~Db~ called fitted valuee~Y~Y^~ called residual\implies \utilde{\hat{Y}}\triangleq D\utilde{b} \text{ called fitted value}\qquad \utilde{e}\triangleq \utilde{Y}-\utilde{\hat{Y}}\text{ called residual}

有的時候我們並不需要 b~\utilde{b},我們只關心 Y^~\utilde{\hat{Y}}。這裡我們討論樣本數量 nn 大於參數數量 pp 的情況。

θ~=Dβ~    E[Y~]=Dβ~=θ~\utilde{\theta}=D\utilde{\beta} \implies E[\utilde{Y}]=D\utilde{\beta}=\utilde{\theta} 並且 Y~=Dβ~+ε~=θ~+ε~\utilde{Y}=D\utilde{\beta}+\utilde{\varepsilon}=\utilde{\theta}+\utilde{\varepsilon}

with θ~Ω{Dβ~:β~Rp}=span{1,X1,X2,,Xk}\begin{align*} \text{with } \utilde{\theta}\in\Omega&\triangleq\set{D\utilde{\beta}:\utilde{\beta}\in\R^p}\\ &=span\set{1, X_1, X_2, \cdots, X_k} \end{align*}

r=dim(Ω)=rank(D)pr=\dim(\Omega)=rank(D)\le p,i.e. Ω\OmegaRn\R^n 向量空間下的 rr 維子空間。令 ΩVr\Omega\triangleq V_rRnVn\R^n\triangleq V_n

    E[Y~]=θ~Ω\implies E[\utilde{Y}]=\utilde{\theta}\in\OmegaY^~=Db~Ω\utilde{\hat{Y}}=D\utilde{b}\in\Omega,並且

Q(β)=Y~Dβ~2=Y~θ~2    Q(b~)=minβ~Rp    Y~Db~2=Y~Y^~2=minβ~RpY~Dβ~2=Y~θ~2\begin{align*} & Q(\beta)=||\utilde{Y}-D\utilde{\beta}||^2=||\utilde{Y}-\utilde{\theta}||^2\\ \implies & Q(\utilde{b})=\min_{\utilde{\beta}\in\R^p}\\ \iff & ||\utilde{Y}-D\utilde{b}||^2=||\utilde{Y}-\utilde{\hat{Y}}||^2=\min_{\utilde{\beta}\in\R^p}||\utilde{Y}-D\utilde{\beta}||^2=||\utilde{Y}-\utilde{\theta}||^2 \end{align*}

i.e. Y^~Ω\utilde{\hat{Y}}\in\Omega s.t. Y~Y^~2=minθ~ΩY~θ~2||\utilde{Y}-\utilde{\hat{Y}}||^2=\min_{\utilde{\theta}\in\Omega}||\utilde{Y}-\utilde{\theta}||^2

之前我們討論過 Y~\utilde{Y}Y^~\utilde{\hat{Y}} 的關係:

alt text

這裡 Y~Vn\utilde{Y}\in V_nY^~Ω\utilde{\hat{Y}}\in\Omega。因此 Y^~\utilde{\hat{Y}} 可以看作為 Y~\utilde{Y}Ω\Omega 上的投影。而 e~\utilde{e} 所在的空間垂直於 Ω\Omega,記作 Vr=ΩV_r^\perp=\Omega^\perp

i.e. Y~Vn=Rn\forall \utilde{Y}\in V_n=\R^n!w~Ω,zΩ\exist!\utilde{w}\in\Omega, z\in\Omega^\perp s.t. Y~=w~+z~\utilde{Y}=\utilde{w}+\utilde{z},並且 w~\utilde{w}Y~\utilde{Y}Ω\Omega 上的投影

    Y~\implies \utilde{Y}Ω\Omega 上的投影 Y^~=θ^~\utilde{\hat{Y}}=\utilde{\hat{\theta}} 是唯一讓 Y~θ~2||\utilde{Y}-\utilde{\theta}||^2 最小的點。

Lemma 3

VrVnV_r\subset V_n 是向量空間

Y~Vn,w~\utilde{Y}\in V_n, \utilde{w}Y~\utilde{Y}VrV_r 上的投影

    w~n×1=Pn×nY~n×1\implies \utilde{w}_{n\times 1}=P_{n\times n}\utilde{Y}_{n\times 1},其中 PP 滿足以下特性:

  1. 對稱(Symmetric):Pt=PP^t=P
  2. 幂等(Idempotent):P2=PP^2=P
  3. rank(P)=rrank(P)=r

Proof: 令 α~1,α~2,α~r\utilde{\alpha}_1,\utilde{\alpha}_2,\cdots\utilde{\alpha}_rVrV_r 的一組 orthogonal basis。因為是基底互相正交且長度為 1

    w~=i=1r(Y~tα~i)α~ie.g. [123]=1[100]+2[010]+3[001]=(i=1rα~iα~it)Y~note Y~tα~i=α~itY~ is a scalar=TTtY~=PY~where Tn×r=[α~1α~2α~r],Pn×n=TTt\begin{align*} \implies \utilde{w}&=\sum_{i=1}^r(\utilde{Y}^t\utilde{\alpha}_i)\utilde{\alpha}_i\quad \text{e.g. } \begin{bmatrix*} 1\\2\\3 \end{bmatrix*}=1\cdot\begin{bmatrix*} 1\\0\\0 \end{bmatrix*}+2\cdot\begin{bmatrix*} 0\\1\\0 \end{bmatrix*}+3\cdot\begin{bmatrix*} 0\\0\\1 \end{bmatrix*}\\ &=(\sum_{i=1}^r\utilde{\alpha}_i\utilde{\alpha}_i^t)\utilde{Y}\quad \text{note }\utilde{Y}^t\utilde{\alpha}_i=\utilde{\alpha}_i^t\utilde{Y} \text{ is a scalar}\\ &= T\cdot T^t\utilde{Y}=P\utilde{Y} \quad \text{where } T_{n\times r}=\begin{bmatrix} \utilde{\alpha}_1 & \utilde{\alpha}_2 & \cdots & \utilde{\alpha}_r \end{bmatrix}, P_{n\times n}=T\cdot T^t \end{align*}     Pt=(TTt)t=TTt=PSymmetric\implies P^t=(TT^t)^t=TT^t=P \quad \text{Symmetric}

因為 TT 是 orthogonal matrix,所以 TtT=IT^tT=I,因此

PP=(TTt)(TTt)=TTtTTt=TTt=PIdempotentPP=(TT^t)(TT^t)=TT^tTT^t=TT^t=P \quad \text{Idempotent}

並且

rank(P)=rank(TTt)=rank(T)=rank(Tt)=rrank(P)=rank(TT_t)=rank(T)=rank(T^t)=r

因為

rank(AB)min(rank(A),rank(B))rank(AB)\le min(rank(A), rank(B))     rank(P)min(rank(T),rank(Tt))=r\implies rank(P)\le min(rank(T), rank(T^t))=r r=rank(T)=rank(TI)=rank(TTtT)=rank(PT)min(rank(P),rank(T))rank(P)r=rank(T)=rank(TI)=rank(TT^tT)=rank(PT)\le min(rank(P), rank(T))\le rank(P)

i.e. rank(P)=rrank(P)=r

Lemma 4

P: n×nn\times n 的對稱幂等矩陣(Symmetric Idempotent Matrix),並且 rank(P)=rrank(P)=r。則:

  1. PPrr 個特徵值為 1,nrn-r 個特徵值為 0
  2. tr(P)i=1nPii=r\text{tr}(P)\triangleq\sum_{i=1}^n P_{ii}=r
  3. IPI-P 也是對稱幂等矩陣,並且 rank(IP)=nr=tr(IP)rank(I-P)=n-r=\text{tr}(I-P)
  4. a~Rn,a~tPa~0\forall \utilde{a}\in\R^n, \utilde{a}^tP\utilde{a}\ge 0

Proof:

因為 PP 是對稱的,存在一個 orthogonal matrix AA (i.e. AtA=IA^tA=I) 使得 AtPA=diag(λ1,λ2,,λn)=BA^tPA=diag(\lambda_1, \lambda_2,\cdots, \lambda_n)=B,其中 λi\lambda_iPP 的特徵值。

    B2=BB=AtPAAtPA=AtPA=B\implies B^2=BB=A^tPA\cdot A^tPA=A^tPA=B, i.e. λi2=λi\lambda^2_i=\lambda_i, i=1,2,,ni=1,2,\cdots, n

    λi=0\implies \lambda_i=0 or 1,i1, \forall i,但是

r=rank(P)=Rank(AtPA)A is nonsingular=Rank(B)\begin{align*} r=rank(P)&=Rank(A^tPA) \quad \because A \text{ is nonsingular}\\ &=Rank(B)\\ \end{align*}

    \implies BBrr 個 1 和 nrn-r 個 0。並且還有

tr(P)=tr(PAtA)=tr(AtPA)=tr(B)=r\text{tr}(P)=\text{tr}(PA^tA)=\text{tr}(A^tPA)=\text{tr}(B)=r

記得如果 b~\utilde{b}β~\utilde{\beta} 的 LSE     DtY~=DtDb~\iff D^t\utilde{Y}=D^tD\utilde{b}

根據上面的結論,我們可以得到

Y^~=Db~=Lemma 3PY~with P:n×n symmetric idempotent matrix and rank(P)=rank(D)p= projection of Y~ onto Ωwith Ωspan(D)\begin{align*} \utilde{\hat{Y}}&=D\utilde{b}\xlongequal{\text{Lemma 3}}P\utilde{Y} \quad \text{with } P: n\times n \text{ symmetric idempotent matrix} \text{ and } rank(P)=rank(D)\le p\\ &= \text{ projection of } \utilde{Y} \text{ onto } \Omega\quad \text{with } \Omega\triangleq span(D) \end{align*}

這裡我們假設未知的回歸係數數量 p=k+1p=k+1\le 樣本數 nn

rank(D)=p    D is full rank    dim(Ω)=p     the columns of D are linearly independent    DtD is nonsingular    (DtD)1 exists    normal equation has unique solution\begin{align*} rank(D)=p&\iff D \text{ is full rank}\\ &\iff \dim(\Omega)=p\\ &\iff \text{ the columns of } D \text{ are linearly independent}\\ &\iff D^tD \text{ is nonsingular}\\ &\iff (D^tD)^{-1} \text{ exists}\\ &\implies \text{normal equation has unique solution} \end{align*}

i.e. b~p×1=(DtD)1DtY~\utilde{b}_{p\times 1}=(D^tD)^{-1}D^t\utilde{Y}

    Y^~=Db~=D(DtD)1DtY~=HY~where HD(DtD)1Dt hat matrix\begin{align*} \implies \utilde{\hat{Y}}&=D\utilde{b}\\ &=D(D^tD)^{-1}D^t\utilde{Y}\\ &=H\utilde{Y}\quad \text{where } H\triangleq D(D^tD)^{-1}D^t\quad\text{ hat matrix} \end{align*}

並且 e~=Y~Y^~=Y~HY~=(IH)Y~MY\utilde{e}=\utilde{Y}-\utilde{\hat{Y}}=\utilde{Y}-H\utilde{Y}=(I-H)\utilde{Y}\triangleq MY 其中 MIHM\triangleq I-H 稱為殘差矩陣(Residual Matrix)。

我們可以很容易的檢查 HHMM 的性質:

H=D(DtD)1Dtn×n    Ht=HSymmetricHH=HIdempotentwith rank(H)=p=dim(Ω)\begin{align*} & H=D(D^tD)^{-1}D^t\quad n\times n\\ \implies &H^t=H\quad \text{Symmetric}\\ &HH=H\quad \text{Idempotent}\\ &\text{with }rank(H)=p=\dim(\Omega)\\ \end{align*}

並且 M=IHM=I-H 也是對稱幂等矩陣, rank(M)=nprank(M)=n-p

Note

H 是 Ω\Omega 上的投影矩陣,而 M 是 Ω\Omega^\perp 上的投影矩陣。並且 Rn=Ω+Ω\R^n=\Omega+\Omega^\perp

如果我們把已經在 Ω\Omega 上的向量再投影到 Ω\Omega 上,那麼投影後的向量不會改變。而如果把 Ω\Omega 上的向量投影到垂直於 Ω\Omega 的空間上,那麼投影後的向量會是 0。i.e.

Hθ~=θ~Mθ~=0θ~ΩH\utilde{\theta}=\utilde{\theta} \qquad M\utilde{\theta}=0 \quad \forall \utilde{\theta}\in\Omega

因為 DΩ,e~Ω    Dte~=0D\in\Omega, \utilde{e}\in\Omega^\perp\implies D^t\utilde{e}=0。我們可以衍生得到 1~te~=0\utilde{1}^t\utilde{e}=0X~jte~=0\utilde{X}_j^t\utilde{e}=0j=1,2,,kj=1,2,\cdots,k。因為 1~\utilde{1}X~j\utilde{X}_j 都在 Ω\Omega 上。

上面說 Y^~\utilde{\hat{Y}} 是在 Ω\Omega 上的投影,因此 Y^~te~=0\utilde{\hat{Y}}^t\utilde{e}=0。由此可以得到以下性質:

  1. SSE 的性質:

    SSEe~te~=(MY~)t(MY~)=Y~tMtMY~=Y~tMY~M is symmetric idempotent=(Dβ~+ε~)tM(Dβ~+ε~)=β~tDtMDβ~+2β~tDtMε~+ε~tMε~=ε~tMε~DtM=0E[SSE]=E[ε~tMε~]=i=1nj=1nE[εiMijεj]=i=1nMijE[εi2]E[εiεj]=0,ij=σ2tr(M)=σ2(np)E[MSE]=E[SSEnp]=σ2\begin{align*} & \begin{align*} \bullet\quad\text{SSE} &\triangleq \utilde{e}^t\utilde{e}=(M\utilde{Y})^t(M\utilde{Y})=\utilde{Y}^tM^tM\utilde{Y}\\ &=\utilde{Y}^tM\utilde{Y} \quad \because M \text{ is symmetric idempotent}\\ &=(D\utilde{\beta}+\utilde{\varepsilon})^tM(D\utilde{\beta}+\utilde{\varepsilon})\\ &=\utilde{\beta}^tD^tMD\utilde{\beta}+2\utilde{\beta}^tD^tM\utilde{\varepsilon}+\utilde{\varepsilon}^tM\utilde{\varepsilon}\\ &=\utilde{\varepsilon}^tM\utilde{\varepsilon} \quad \because D^tM=0 \end{align*}\\ & \begin{align*} \bullet\quad E[\text{SSE}]&=E[\utilde{\varepsilon}^tM\utilde{\varepsilon}]=\sum_{i=1}^n\sum_{j=1}^nE[\varepsilon_iM_{ij}\varepsilon_j]\\ &=\sum_{i=1}^nM_{ij}E[\varepsilon_i^2] \quad \because E[\varepsilon_i\varepsilon_j]=0, i\ne j\\ &=\sigma^2\text{tr}(M)=\sigma^2(n-p) \end{align*}\\ & \bullet\quad E[\text{MSE}]=E[\frac{\text{SSE}}{n-p}]=\sigma^2 \end{align*}
  2. b~\utilde{b} 的性質:

    E[b~]=E[(DtD)1DtY~]=(DtD)1DtE[Y~]=(DtD)1DtDβ~=β~i.e. b~ is unbiased for β~σ2{b~}=σ2{(DtD)1DtY~}=(DtD)1Dtσ2{Y~}D(DtD)1=σ2(DtD)1DtD(DtD)1=Iσ2{Y~}=σ2I=σ2(DtD)1S2{b~}=MSE(DtD)1to est σ2{b~}\begin{align*} &\begin{align*} \bullet\quad E[\utilde{b}]&=E[(D^tD)^{-1}D^t\utilde{Y}]\\ &=(D^tD)^{-1}D^t\cdot E[\utilde{Y}]\\ &=(D^tD)^{-1}D^tD\utilde{\beta}\\ &=\utilde{\beta}\quad \text{i.e. } \utilde{b} \text{ is unbiased for } \utilde{\beta} \end{align*}\\ &\begin{align*} \bullet\quad \sigma^2\set{\utilde{b}}&=\sigma^2\set{(D^tD)^{-1}D^t\utilde{Y}}\\ &=(D^tD)^{-1}D^t\sigma^2\set{\utilde{Y}}D(D^tD)^{-1}\\ &=\sigma^2 (D^tD)^{-1}\underbrace{D^tD(D^tD)^{-1}}_{=I} \quad \sigma^2\set{\utilde{Y}}=\sigma^2I\\ &=\sigma^2(D^tD)^{-1} \end{align*}\\ &\bullet\quad S^2\set{\utilde{b}}=\text{MSE}(D^tD)^{-1} \quad \text{to est } \sigma^2\set{\utilde{b}} \end{align*}
  3. Y^~\utilde{\hat{Y}} 的性質:

    E[Y^~]=E[HY~]=HE[Y~]=HDβ~=Dβ~=E[Y~]    Y^~ is unbiased for E[Y~]σ2{Y^~}=σ2{HY~}=Hσ2{Y~}Ht=σ2HHt=σ2H\begin{align*} &\begin{align*} \bullet\quad E[\utilde{\hat{Y}}]&=E[H\utilde{Y}]=H\cdot E[\utilde{Y}]=HD\utilde{\beta}=D\utilde{\beta}=E[\utilde{Y}]\\ \implies& \utilde{\hat{Y}} \text{ is unbiased for } E[\utilde{Y}] \end{align*}\\ &\begin{align*} \bullet\quad \sigma^2\set{\utilde{\hat{Y}}}&=\sigma^2\set{H\utilde{Y}}=H\sigma^2\set{\utilde{Y}}H^t\\ &=\sigma^2HH^t=\sigma^2H \end{align*}\\ \end{align*}
  4. e~\utilde{e} 的性質:

    E[e~]=E[MY~]=ME[Y~]=MD=0β~=0σ2{e~}=σ2{MY~}=σ2M\begin{align*} &\bullet\quad E[\utilde{e}]=E[M\utilde{Y}]=ME[\utilde{Y}]=\underbrace{MD}_{=0}\utilde{\beta}=0\\ &\bullet\quad \sigma^2\set{\utilde{e}}=\sigma^2\set{M\utilde{Y}}=\sigma^2M \end{align*}
  5. S2{}σ2{}σ2=MSES^2\set{\ast}\triangleq\sigma^2\set{\ast}|_{\sigma^2=\text{MSE}} is unbiased for σ2{}\sigma^2\set{\ast}

ANOVA

Y~=Y^~+e~    Y~2=Y^~2+e~2+2Y^~te~=Y^~2+e~2=Y^~2+SSEY~=Y~Yˉ1~+Yˉ1~Yˉ1~Ω    Y~2=Y~Yˉ1~2+Yˉ2n=i=1n(YiYˉ)2+nYˉ2=SSTO+nYˉ2Y^~=Y^~Yˉ1~+Yˉ    Y^~2=Y^~Yˉ1~2+Yˉ2n=i=1n(Y^iYˉ)2+nYˉ2=SSR+nYˉ2    Y~2=Y^~2+e~2    SSTO+nYˉ2=SSR+nYˉ2+SSE\begin{align*} &\begin{align*} \bullet\quad &\utilde{Y}=\utilde{\hat{Y}}+\utilde{e}\\ &\begin{align*} \implies ||\utilde{Y}||^2&=||\utilde{\hat{Y}}||^2+||\utilde{e}||^2+2\utilde{\hat{Y}}^t\utilde{e}=||\utilde{\hat{Y}}||^2+||\utilde{e}||^2\\ &=||\utilde{\hat{Y}}||^2 + \text{SSE} \end{align*} \end{align*}\\ &\begin{align*} \bullet\quad &\utilde{Y}=\utilde{Y}-\bar{Y}\utilde{1}+\bar{Y}\utilde{1}\quad \bar{Y}\utilde{1}\in\Omega\\ &\begin{align*} \implies ||\utilde{Y}||^2 &= ||\utilde{Y}-\bar{Y}\utilde{1}||^2 + \bar{Y}^2\cdot n\\ &=\sum_{i=1}^n(Y_i-\bar{Y})^2+n\bar{Y}^2\\ &=\text{SSTO}+n\bar{Y}^2 \end{align*} \end{align*}\\ &\begin{align*} \bullet\quad &\utilde{\hat{Y}}=\utilde{\hat{Y}}-\bar{Y}\utilde{1}+\bar{Y}\\ &\begin{align*} \implies ||\utilde{\hat{Y}}||^2&=||\utilde{\hat{Y}}-\bar{Y}\utilde{1}||^2+\bar{Y}^2\cdot n\\ &=\sum_{i=1}^n(\hat{Y}_i-\bar{Y})^2+n\bar{Y}^2\\ &=\text{SSR}+n\bar{Y}^2 \end{align*} \end{align*}\\ &\implies ||\utilde{Y}||^2=||\utilde{\hat{Y}}||^2+||\utilde{e}||^2\\ &\implies \text{SSTO}+\bcancel{n\bar{Y}^2}=\text{SSR}+\bcancel{n\bar{Y}^2}+\text{SSE} \end{align*}

Note:

Yˉ1~2=nYˉ2=Yˉ1~t1~Yˉ=1nY~t1~1~t1~1n1~tY~=Y~t1~1~tnY~1~t1~=n=Y~tJnY~where J=[1]n×n\begin{align*} ||\bar{Y}\utilde{1}||^2&=n\bar{Y}^2\\ &=\bar{Y}\utilde{1}^t\utilde{1}\bar{Y}\\ &=\frac{1}{n}\utilde{Y}^t\utilde{1}\utilde{1}^t\utilde{1}\frac{1}{n}\utilde{1}^t\utilde{Y}\\ &=\utilde{Y}^t\frac{\utilde{1}\utilde{1}^t}{n}\utilde{Y} \quad \because\utilde{1}^t\utilde{1}=n\\ &=\utilde{Y}^t\frac{J}{n}\utilde{Y}\quad \text{where } J=[1]_{n\times n} \end{align*}
  • (Jn)t=1nJt=1nJ(\frac{J}{n})^t=\frac{1}{n}J^t=\frac{1}{n}J
  • JnJn=Jn\frac{J}{n}\cdot\frac{J}{n}=\frac{J}{n}

    Jn\implies \frac{J}{n} is n×nn\times n symmetric idempotent matrix with rank 1。i.e. 它是一个 span{1~}span\set{\utilde{1}} 上的投影矩阵。

Y~tIY~=Y~t(Jn+IJn)Y~    Y~2=Y~tJnY~+Y~t(IJn)Y~=nYˉ2+SSTOi.e. SSTO i=1n(YiYˉ)2=Y~t(IJn)Y~SSE =e~2=e~te~=(MY~)tMY~=Y~tMtMY~=Y~tMY~=Y~t(IH)Y~SSR =i=1n(Y^iYˉ)2=SSTOSSE=Y~t(IJn)Y~Y~t(IH)Y~=Y~t(HJn)Y~\begin{align*} &\begin{align*} \bullet\quad &\utilde{Y}^tI\utilde{Y}=\utilde{Y}^t\left(\frac{J}{n}+I-\frac{J}{n}\right)\utilde{Y}\\ &\begin{align*} \iff ||\utilde{Y}||^2&=\utilde{Y}^t\frac{J}{n}\utilde{Y}+\utilde{Y}^t\left(I-\frac{J}{n}\right)\utilde{Y}\\ &=n\bar{Y}^2+\text{SSTO} \end{align*}\\ &\text{i.e. SSTO }\triangleq \sum_{i=1}^n(Y_i-\bar{Y})^2=\utilde{Y}^t\left(I-\frac{J}{n}\right)\utilde{Y} \end{align*}\\ &\begin{align*} \bullet\quad \text{SSE } &=||\utilde{e}||^2=\utilde{e}^t\utilde{e}=(M\utilde{Y})^tM\utilde{Y}=\utilde{Y}^tM^tM\utilde{Y}\\ &=\utilde{Y}^tM\utilde{Y}=\utilde{Y}^t(I-H)\utilde{Y} \end{align*}\\ &\begin{align*} \bullet\quad \text{SSR } &= \sum_{i=1}^n(\hat{Y}_i-\bar{Y})^2=\text{SSTO}-\text{SSE}\\ &=\utilde{Y}^t\left(I-\frac{J}{n}\right)\utilde{Y}-\utilde{Y}^t(I-H)\utilde{Y}\\ &=\utilde{Y}^t(H-\frac{J}{n})\utilde{Y} \end{align*} \end{align*}

Note:

I=JnΩ+(IJn)Ω=HΩ+(IH)Ω=(HJn)Ω+IH+JnΩandSSTO=Y~t(IJn)Y~SSE=Y~t(IH)Y~SSR=Y~t(HJn)Y~\begin{align*} I&=\underbrace{\frac{J}{n}}_{\in\Omega}+\underbrace{(I-\frac{J}{n})}_{\in\Omega^\perp}\\ &=\underbrace{H}_{\in\Omega}+\underbrace{(I-H)}_{\in\Omega^\perp}\\ &=\underbrace{(H-\frac{J}{n})}_{\in\Omega}+\underbrace{I-H+\frac{J}{n}}_{\in\Omega^\perp} \end{align*}\qquad\text{and}\qquad \begin{align*} &\text{SSTO}=\utilde{Y}^t\left(I-\frac{J}{n}\right)\utilde{Y}\\ &\text{SSE}=\utilde{Y}^t(I-H)\utilde{Y}\\ &\text{SSR}=\utilde{Y}^t(H-\frac{J}{n})\utilde{Y} \end{align*}
Corollary

I=P1+P2++PmI=P_1+P_2+\cdots+P_m, where PiP_i is projection matrix then Y~2=i=1mY~tPiY~||\utilde{Y}||^2=\sum_{i=1}^m\utilde{Y}^tP_i\utilde{Y}

從上面的推理我們可以知道 SSTO, SSE, SSR 都是 quadratic form。Y~tPY~\utilde{Y}^tP\utilde{Y},其中 PP 是 projection matrix (n×nn\times n symmetric idempotent matrix)。

Gauss-Markov Theorem

Definition

Let ξ=c~tβ~\xi=\utilde{c}^t\utilde{\beta} with c~Rp\utilde{c}\in\R^p

ξ\xi is estimable if a~Rn\exist \utilde{a}\in\R^n s.t. β~,E[a~tY~]=ξ=c~tβ~\forall\utilde{\beta}, E[\utilde{a}^t\utilde{Y}]=\xi=\utilde{c}^t\utilde{\beta}, i.e. a~tY~\utilde{a}^t\utilde{Y} is unbiased for ξ\xi

EX:

[z1zmw1wn]=[1,0,1,00,1,0,1][μ1μ2]+ε\begin{bmatrix} z_1\\\vdots\\z_m\\w_1\\\vdots\\w_n \end{bmatrix}= \begin{bmatrix} 1, 0\\ \vdots, \vdots\\ 1, 0\\ 0, 1\\ \vdots, \vdots\\ 0, 1 \end{bmatrix} \begin{bmatrix} \mu_1\\\mu_2 \end{bmatrix}+\varepsilon

If interested in ξ=μ1μ2=[1,1]β=ctβ\xi=\mu_1-\mu_2=[1, -1]\beta = c^t\beta

  1. at=[1,0,,0,1,0,,0]    E(atY)=E[Z1W1]=μ1μ2,βR2a^t=[1,0,\cdots,0,-1,0,\cdots,0]\implies E(a^tY)=E[Z_1-W_1]=\mu_1-\mu_2, \forall\beta\in\R^2
  2. at=[1/m,,1/m,1/n,,1/n]    E(atY)=E[ZˉWˉ]=μ1μ2,βR2a^t=[1/m,\cdots,1/m,1/n,\cdots,1/n]\implies E(a^tY)=E[\bar{Z}-\bar{W}]=\mu_1-\mu_2, \forall\beta\in\R^2
Theorem

Gauss-Markov Theorem:

Suppose # of observations npn\ge p # of parameters

If Y~=Dβ~+ε~ with E[ε~]=0,σ2{ε~}=σ2In×n\text{If }\utilde{Y}=D\utilde{\beta}+\utilde{\varepsilon}\text{ with } E[\utilde{\varepsilon}]=0, \sigma^2\set{\utilde{\varepsilon}}=\sigma^2I_{n\times n}

Let c~Rp\utilde{c}\in\R^p s.t. ξ=c~tβ~\xi =\utilde{c}^t\utilde{\beta} is estimable and ξ^=c~tb~\hat{\xi}=\utilde{c}^t\utilde{b} where b~\utilde{b} is LSE of β~\utilde{\beta}

Then

  1. ξ^\hat{\xi} is linear unbiased estimator for ξ\xi
  2. σ2{ξ^}σ2{ξ~},ξ~\sigma^2\set{\hat{\xi}}\le\sigma^2\set{\tilde{\xi}}, \forall \tilde{\xi} is linear unbiased estimator for ξ\xi

i.e. ξ^\hat{\xi} is BLUE (Best Linear Unbiased Estimator) for ξ\xi

Proof:

因為我們對於所有 β~\utilde{\beta} 都假設 ξ=c~tβ~\xi=\utilde{c}^t\utilde{\beta} 是 estimable,其中 c~Rp\utilde{c}\in\R^p。i.e.

a~RnβRp, s.t. E[a~tY~]=ξ    a~tDβ~=c~tβ~,β~Rp    (a~tDc~t)β~=0,β~Rp    a~tDc~t=0β~Rp    a~tD=c~t\begin{align*} &\exist \utilde{a}\in\R^n \forall\beta\in\R^p, \text{ s.t. } E[\utilde{a}^t\utilde{Y}]=\xi\\ \iff& \utilde{a}^tD\utilde{\beta}=\utilde{c}^t\utilde{\beta},\forall\utilde{\beta}\in\R^p\\ \iff& \left(\utilde{a}^tD-\utilde{c}^t\right)\utilde{\beta}=0,\forall\utilde{\beta}\in\R^p\\ \iff& \utilde{a}^tD-\utilde{c}^t=0\quad\because \forall\utilde{\beta}\in\R^p\\ \iff& \utilde{a}^tD=\utilde{c}^t \end{align*}

Let Ω=\Omega= column space of DD,which is a sub-vector space of Rn\R^n.

    a~Rn,!d~Ω,ω~Ω\implies \forall\utilde{a}\in\R^n,\exist!\utilde{d}\in\Omega,\utilde{\omega}\in\Omega^\perp s.t. a~=d~+ω~\utilde{a}=\utilde{d}+\utilde{\omega}. 對於每個 a~\utilde{a},我們都可以找到相應的唯一分解。

    E[a~tY~]=E[(d~+ω~)tY~]=E[d~tY~]+E[ω~tY~]=d~tDβ~+ω~tDβ~=0ω~Ω and DΩ=d~tDβ~\begin{align*} \implies E[\utilde{a}^t\utilde{Y}]&=E[(\utilde{d}+\utilde{\omega})^t\utilde{Y}]=E[\utilde{d}^t\utilde{Y}] + E[\utilde{\omega}^t\utilde{Y}]\\ &=\utilde{d}^tD\utilde{\beta}+\underbrace{\utilde{\omega}^tD\utilde{\beta}}_{=0}\quad \because \utilde{\omega}\in\Omega^\perp\text{ and } D\in\Omega\\ &=\utilde{d}^tD\utilde{\beta} \end{align*}

E[a~tY~]=ξ=c~tβ~,β~Rp    d~tY~E[\utilde{a}^t\utilde{Y}]=\xi=\utilde{c}^t\utilde{\beta},\forall\utilde{\beta}\in\R^p\implies\utilde{d}^t\utilde{Y} is also unbiased for ξ\xi and d~tD=c~t\utilde{d}^tD=\utilde{c}^t

Claim: β~Rp,d~\forall\utilde{\beta}\in\R^p,\utilde{d} is the only vector in Ω\Omega s.t. E[d~tY~]=ξE[\utilde{d}^t\utilde{Y}]=\xi

Proof: Let α~Ω\utilde{\alpha}\in\Omega s.t. E[α~tY~]=ξ=c~tβ~,β~RpE[\utilde{\alpha}^t\utilde{Y}]=\xi=\utilde{c}^t\utilde{\beta},\forall\utilde{\beta}\in\R^p

    E[(α~d~)tY~]=0    E[(α~d~)tDβ~]=0β~Rp    (α~d~)tD=0β~Rp    α~d~Ω, but α~ and d~ are both in Ω    α~d~ΩΩ=    α~=d~\begin{align*} &\implies E[(\utilde{\alpha}-\utilde{d})^t\utilde{Y}]=0\\ &\implies E[(\utilde{\alpha}-\utilde{d})^tD\utilde{\beta}]=0\quad \forall\utilde{\beta}\in\R^p\\ &\implies (\utilde{\alpha}-\utilde{d})^tD=0\quad \forall\utilde{\beta}\in\R^p\\ &\implies \utilde{\alpha}-\utilde{d}\in\Omega^\perp\text{, but }\utilde{\alpha} \text{ and } \utilde{d} \text{ are both in } \Omega\\ &\implies \utilde{\alpha}-\utilde{d}\in \Omega\cap\Omega^\perp=\empty\\ &\implies \utilde{\alpha}=\utilde{d} \end{align*}

i.e. 即使能為 ξ\xi 找到多個線性無偏估計,但他們在 Ω\Omega 上的投影是相同的。

令他們共同的投影為 d~\utilde{d},則我們有 a~tD=c~t,d~tD=c~t\utilde{a}^tD=\utilde{c}^t, \utilde{d}^tD=\utilde{c}^t 並且 E[d~tY~]=ξ=c~tβ~,β~RpE[\utilde{d}^t\utilde{Y}]=\xi=\utilde{c}^t\utilde{\beta}, \forall\utilde{\beta}\in\R^p

Note d~Ω\utilde{d} \in \Omega and Y~Db~=Y~Y^~=e~Ω    d~te~=0\utilde{Y}-D\utilde{b}=\utilde{Y}-\utilde{\hat{Y}}=\utilde{e} \in\Omega^\perp\implies \utilde{d}^t\utilde{e}=0

    d~t(Y~Db~)=d~tY~d~tDb~=d~tY~c~tb~=0\implies \utilde{d}^t(\utilde{Y}-D\utilde{b})=\utilde{d}^t\utilde{Y}-\utilde{d}^tD\utilde{b}=\utilde{d}^t\utilde{Y}-\utilde{c}^t\utilde{b}=0

i.e. ξ^c~tb~=d~tY~\hat{\xi}\triangleq\utilde{c}^t\utilde{b}=\utilde{d}^t\utilde{Y}

i.e. ξ^\hat{\xi} is linear estimator for ξ\xi

Now let ξ~\tilde{\xi} be any linear unbiased estimator for ξ\xi

i.e. ξ~=a~tY~\tilde{\xi}=\utilde{a}^t\utilde{Y} for some a~Rn\utilde{a}\in\R^n s.t. E[a~tY~]=ξ=c~tβ~,βRpE[\utilde{a}^t\utilde{Y}]=\xi=\utilde{c}^t\utilde{\beta}, \forall\beta\in\R^p

Let d~\utilde{d} be a~\utilde{a}'s projection onto Ω\Omega

σ2{ξ~}=σ2{a~tY~}=a~tσ2{Y~}a~=σ2a~2=σ2d~+ω~2σd~2=d~tσ2Id~=σ2{d~tY~}=σ2{ξ^}\begin{align*} \sigma^2\set{\tilde{\xi}}&=\sigma^2\set{\utilde{a}^t\utilde{Y}}=\utilde{a}^t\sigma^2\set{\utilde{Y}}\utilde{a}=\sigma^2||\utilde{a}||^2=\sigma^2||\utilde{d}+\utilde{\omega}||^2\\ &\ge \sigma||\utilde{d}||^2=\utilde{d}^t\sigma^2I\utilde{d}=\sigma^2\set{\utilde{d}^t\utilde{Y}}=\sigma^2\set{\hat{\xi}} \end{align*}

i.e. σ2{ξ~}σ2{ξ^},ξ~\sigma^2\set{\tilde{\xi}\ge}\sigma^2\set{\hat{\xi}},\forall \tilde{\xi}


Remark:

  1. Gauss-Markov Theorem 不需要分配的假設,並且以上證明對於 DD 不是 full rank 也成立

  2. 對於 ξ=c~tβ~\xi=\utilde{c}^t\utilde{\beta} 是 estimable

    i.e. a~Rn\exist\utilde{a}\in\R^n s.t E[a~tY~]=ξ=c~tβ~,β~RpE[\utilde{a}^t\utilde{Y}]=\xi=\utilde{c}^t\utilde{\beta},\forall\utilde{\beta}\in\R^p

        a~tD=c~t\implies\utilde{a}^tD=\utilde{c}^t

    i.e. c~t\utilde{c}^t 是由 D 的 row vectors 線性組合而成的

    i.e. c~t\utilde{c}^t\in row space of D

    i.e. c~tβ~\utilde{c}^t\utilde{\beta} is estimable     c~t\iff\utilde{c}^t\in row space of D

    對於某些設計矩陣 D,c~tβ~\utilde{c}^t\utilde{\beta} 可能不是 estimable

    e.g. D=[1010]row space={(c,0);cR} \text{e.g. }D=\begin{bmatrix*} 1&0\\ \vdots&\vdots\\ 1&0 \end{bmatrix*}\quad \text{row space}=\set{(c,0);c\in\R}

        \iff estimable par is cβ0,cRc\beta_0,\forall c\in\R and β1\beta_1 is not estimable

    If row space of D=RpD=\R^p, i.e. D is full rank, or (DtD)1(D^tD)^{-1} exists     c~Rp,c~tβ~\iff\forall\utilde{c}\in\R^p, \utilde{c}^t\utilde{\beta} is estimable

  3. 如果 rank(D)=p\text{rank}(D)=p(i.e. (DtD)1(D^tD)^{-1} exists), 那麼 Gauss-Markov Theorem 有一個更簡單的證明:

    σ2{ξ~}=σ2a~tIa~σ2{ξ^}=σ2{c~tb~}=σ2{c~t(DtD)1DtY~}=c~t(DtD)1Dtσ2{Y~}D(DtD)1c~=σ2c~t(DtD)1DtD(DtD)1c~=σ2c~t(DtD)1c~=σ2a~tD(DtD)1Dta~c~t=a~tD=σ2a~tHa~σ2{ξ~}σ2{ξ^}=σ2a~t(IH)a~=σ2a~tMa~0M is Positive definite \begin{align*} &\sigma^2\set{\tilde{\xi}}=\sigma^2\utilde{a}^tI\utilde{a}\\ &\begin{align*} \sigma^2\set{\hat{\xi}}&=\sigma^2\set{\utilde{c}^t\utilde{b}}=\sigma^2\set{\utilde{c}^t(D^tD)^{-1}D^t\utilde{Y}}\\ &=\utilde{c}^t(D^tD)^{-1}D^t\sigma^2\set{\utilde{Y}}D(D^tD)^{-1}\utilde{c}\\ &=\sigma^2\utilde{c}^t(D^tD)^{-1}D^tD(D^tD)^{-1}\utilde{c}\\ &=\sigma^2\utilde{c}^t(D^tD)^{-1}\utilde{c}\\ &=\sigma^2\utilde{a}^tD(D^tD)^{-1}D^t\utilde{a}\quad \because \utilde{c}^t=\utilde{a}^tD\\ &=\sigma^2\utilde{a}^tH\utilde{a} \end{align*}\\ &\sigma^2\set{\tilde{\xi}}-\sigma^2\set{\hat{\xi}}=\sigma^2\utilde{a}^t(I-H)\utilde{a}=\sigma^2\utilde{a}^tM\utilde{a}\ge 0\quad\because M \text{ is Positive definite} \end{align*}

Multivariate Normal

Z1,Z2,,ZniidN(0,1)Z_1,Z_2,\cdots, Z_n\overset{\text{iid}}{\sim}N(0,1)

  • Joint pdf f(z~)=(12π)nei=1nzi2/2=(12π)nez2/2,z~Rnf(\utilde{z})=\left(\frac{1}{\sqrt{2\pi}}\right)^ne^{-\sum_{i=1}^n z^2_i /2}=\left(\frac{1}{\sqrt{2\pi}}\right)^ne^{-||z||^2/2},\utilde{z}\in\R^n where z~2=i=1nzi2χn2||\utilde{z}||^2=\sum_{i=1}^nz^2_i\sim\chi^2_n
  • E[Z~]=0,σ2{Z~}=IE[\utilde{Z}]=0, \sigma^2\set{\utilde{Z}}=I
  • Z~Nn(0,I)\utilde{Z}\sim N_n(0, I)

Let Am×n,η~m×1A_{m\times n}, \utilde{\eta}_{m\times 1} be non-random matrix and W~m×1=AZ~n×1\utilde{W}_{m\times 1}=A\utilde{Z}_{n\times 1}+ η~\utilde{\eta}

    E[W~]=AE[Z~]+η~=η~,σ2{W~}=AAt\implies E[\utilde{W}]=AE[\utilde{Z}]+\utilde{\eta}=\utilde{\eta}, \sigma^2\set{\utilde{W}}=AA^t uniquely determine the distribution of W~\utilde{W}

W~Nm(E[W~],σ2{W~})=Nm(η~,Σw~)where Σw~=AAt\utilde{W}\sim N_m(E[\utilde{W}], \sigma^2\set{\utilde{W}})=N_m(\utilde{\eta}, \bcancel\Sigma_{\utilde{w}}) \quad \text{where } \bcancel\Sigma_{\utilde{w}}=AA^t

If ΣW~1\bcancel{\Sigma}_{\utilde{W}}^{-1} exist, say that ΣW~\bcancel{\Sigma}_{\utilde{W}} is nonsingular, then pdf of W~\utilde{W} is

fW~(w~)=(12π)mΣW~12eQ2where ΣW~=det(ΣW~),Q=(w~η~)tΣW~1(w~η~)f_{\utilde{W}}(\utilde{w})=\left(\frac{1}{\sqrt{2\pi}}\right)^m|\bcancel{\Sigma}_{\utilde{W}}|^{\frac{-1}{2}}e^{-\frac{Q}{2}}\quad \text{where } |\bcancel{\Sigma}_{\utilde{W}}|=\det(\bcancel{\Sigma}_{\utilde{W}}), Q=(\utilde{w}-\utilde{\eta})^t\bcancel{\Sigma}_{\utilde{W}}^{-1}(\utilde{w}-\utilde{\eta})
fW~(w~)=i=1m12π1σiexp{(wiηi)22σi2}f_{\utilde{W}}(\utilde{w})=\prod_{i=1}^m\frac{1}{\sqrt{2\pi}}\frac{1}{\sigma_i}\exp\set{\frac{-(w_i-\eta_i)^2}{2\sigma_i^2}}     Σw~=diag(σ12,σ22,,σm2)    σ{wi,wj}=cov(wi,wj)=0ij    w1,w2,,wm are independent\begin{align*} &\iff \bcancel{\Sigma}_{\utilde{w}}=\text{diag}(\sigma_1^2,\sigma_2^2,\cdots,\sigma_m^2)\\ &\iff \sigma\set{w_i,w_j}=\text{cov}(w_i,w_j)=0\quad \forall i\ne j\\ &\iff w_1,w_2,\cdots,w_m \text{ are independent} \end{align*}
Lemma 6

W~Nm(η~,ΣW~)\utilde{W}\sim N_m(\utilde{\eta}, \bcancel{\Sigma}_{\utilde{W}}) where ΣW~\bcancel{\Sigma}_{\utilde{W}} is nonsingular

    Q=(W~η~)tΣW~1(W~η~)χm2\implies Q=(\utilde{W}-\utilde{\eta})^t\bcancel{\Sigma}_{\utilde{W}}^{-1}(\utilde{W}-\utilde{\eta})\sim \chi^2_m

Proof:

  • Way 1: ΣW~\bcancel{\Sigma}_{\utilde{W}} is sym non-singular matrix, B\exist B: non-singular s.t. BΣW~Bt=IB\bcancel{\Sigma}_{\utilde{W}}B^t=I

        ΣW~=(BtB)1\implies \bcancel{\Sigma}_{\utilde{W}}=(B^tB)^{-1}, i,e, ΣW~1=BtB\bcancel{\Sigma}_{\utilde{W}}^{-1}=B^tB

    B(W~η~)Nm(0~,Bσ2W~Bt)=Nm(0~,I)B(\utilde{W}-\utilde{\eta})\sim N_m(\utilde{0}, B\sigma^2{\utilde{W}}B^t)=N_m(\utilde{0}, I)

        (B(W~η~))t(B(W~η~))χm2\implies (B(\utilde{W}-\utilde{\eta}))^t(B(\utilde{W}-\utilde{\eta}))\sim\chi^2_m

        (W~η~)tΣW~1(W~η~)χm2\implies(\utilde{W}-\utilde{\eta})^t\bcancel{\Sigma}_{\utilde{W}}^{-1}(\utilde{W}-\utilde{\eta})\sim\chi^2_m

  • Way 2:

    MQ(t)=E[etQ]=RmfW~(w~)etQdw~=RmetQ(12π)mΣW~12eQ2dw~=Rm(12π)mΣW~12exp{12(12t)(w~η~)tΣW~1(w~η~)}dw~=Rm(12π)mΣW~12exp{12(w~η~)t(ΣW~12t)1(w~η~)t}dw~ΣΣW~12t=(12t)m2Rm(12π)mΣ12exp{12(w~η~)tΣ(w~η~)}dw~=(12t)m2=mgf of χm2    Qχm2\begin{align*} M_Q(t)=E[e^{tQ}]&=\int_{\R^m}f_{\utilde{W}}(\utilde{w})e^{tQ}d\utilde{w}\\ &=\int_{\R^m}e^{tQ}\left(\frac{1}{\sqrt{2\pi}}\right)^m|\bcancel{\Sigma}_{\utilde{W}}|^{\frac{-1}{2}}e^{-\frac{Q}{2}}d\utilde{w}\\ &=\int_{\R^m}\left(\frac{1}{\sqrt{2\pi}}\right)^m|\bcancel{\Sigma}_{\utilde{W}}|^{\frac{-1}{2}}\exp\set{\frac{-1}{2}(1-2t)(\utilde{w}-\utilde{\eta})^t\bcancel{\Sigma}_{\utilde{W}}^{-1}(\utilde{w}-\utilde{\eta})}d\utilde{w}\\ &=\int_{\R^m}\left(\frac{1}{\sqrt{2\pi}}\right)^m|\bcancel{\Sigma}_{\utilde{W}}|^{\frac{-1}{2}}\exp\set{\frac{-1}{2}(\utilde{w}-\utilde{\eta})^t(\frac{\bcancel{\Sigma}_{\utilde{W}}}{1-2t})^{-1}(\utilde{w}-\utilde{\eta})^t}d\utilde{w}\\ \bcancel{\Sigma}^*\triangleq\frac{\bcancel{\Sigma}_{\utilde{W}}}{1-2t}\quad &=(1-2t)^{-\frac{m}{2}}\int_{\R^m}\left(\frac{1}{\sqrt{2\pi}}\right)^m|\bcancel{\Sigma}^*|^{\frac{-1}{2}}\exp\set{\frac{-1}{2}(\utilde{w}-\utilde{\eta})^t\bcancel{\Sigma}^*(\utilde{w}-\utilde{\eta})}d\utilde{w}\\ &=(1-2t)^{-\frac{m}{2}}\\ &=\text{mgf of }\chi^2_m\\ &\implies Q\sim\chi^2_m \end{align*}

Distribution of Quadratic Form

Definition

non-central chi-square:

WiN(θi,1),i=1,,nW_i\sim N(\theta_i, 1), i=1,\cdots, n independent, i.e. W~Nn(θ~,I)\utilde{W}\sim N_n(\utilde{\theta}, I)

W~2=i=1nWi2χn,δ2 where δ=θ~2=i=1nθi2||\utilde{W}||^2=\sum_{i=1}^nW_i^2\sim\chi^2_{n,\delta}\text{ where }\delta=||\utilde{\theta}||^2=\sum_{i=1}^n\theta_i^2
  • nn called degree of freedom
  • δ\delta called non-centrality

Note: χn,02=χn2\chi^2_{n,0}=\chi^2_n

Note:

u=W~2χn,δ2 where δ=θ~2=di=1nN(θi,1)2δ=i=1nθi2=(i=1nθi2)2=dN(θ~,1)+i=1n1N(0,1)2=dχ1,δ2+χn12\begin{align*} u&=||\utilde{W}||^2\sim\chi^2_{n,\delta}\text{ where }\delta=||\utilde{\theta}||^2\\ &\overset{\text{d}}{=}\sum_{i=1}^nN(\theta_i, 1)^2\quad\delta=\sum_{i=1}^n\theta_i^2=\left(\sqrt{\sum_{i=1}^n\theta_i^2}\right)^2\\ &\overset{\text{d}}{=}N(||\utilde{\theta}||, 1) + \sum_{i=1}^{n-1}N(0,1)^2\\ &\overset{\text{d}}{=}\chi^2_{1, \delta}+\chi^2_{n-1} \end{align*}
E[u]=E[(Z+θ)2]+n1=E[Z2]+2θE[Z]+θ2+n1=n+θ2i.e. E[χn,δ2]=n+δ\begin{align*} E[u]&=E[(Z+||\theta||)^2]+n-1\\ &=E[Z^2]+2||\theta||E[Z]+||\theta||^2+n-1\\ &=n+||\theta||^2\\ \text{i.e. }& E[\chi^2_{n,\delta}]=n+\delta \end{align*}
Var[u]=Var[(Z+θ)2]+2(n1)=E[(Z+θ)4]E[(Z+θ)2]2+2(n1)=2n+4δ\begin{align*} Var[u]&=Var[(Z+||\theta||)^2]+2(n-1)\\ &=E[(Z+||\theta||)^4]-E[(Z+||\theta||)^2]^2+2(n-1)\\ &=2n+4\delta \end{align*}
U1χn1,δ12,U2χn2,δ22 independent    U1+U2χn1+n2,δ1+δ22U_1\sim\chi^2_{n_1,\delta_1}, U_2\sim\chi^2_{n_2,\delta_2}\text{ independent}\implies U_1+U_2\sim\chi^2_{n_1+n_2, \delta_1+\delta_2}
Definition

non-central F, t:

Fm,n,δ=dχm,δ2/mχn2/ntm,δ=dN(δ,1)χm2/m分子分母均獨立F_{m,n,\delta}\overset{\text{d}}{=}\frac{\chi^2_{m,\delta}/m}{\chi^2_n/n} \qquad t_{m,\delta}\overset{\text{d}}{=}\frac{N(\sqrt{\delta},1)}{\sqrt{\chi^2_m/m}}\quad\text{分子分母均獨立}

    (tm,δ)2=dF1,m,δ\implies (t_{m,\delta})^2\overset{\text{d}}{=}F_{1,m,\delta}

Y~=Dβ~+ε~=θ~+ε~where ε~Nn(0~,I)\utilde{Y}=D\utilde{\beta}+\utilde{\varepsilon}=\utilde{\theta}+\utilde{\varepsilon}\quad\text{where }\utilde{\varepsilon}\sim N_n(\utilde{0}, I)

    Y~Nn(θ~,σ2I)\implies \utilde{Y}\sim N_n(\utilde{\theta}, \sigma^2I)

Lemma 7

Let Y~Nn(θ~,σ2I)\utilde{Y}\sim N_n(\utilde{\theta}, \sigma^2I) and PP is n×nn\times n symmetric idempotent matrix of rank rr

    Y~tPY~σ2χr,δ2 where δ=θ~tPθ~σ2\implies \frac{\utilde{Y}^tP\utilde{Y}}{\sigma^2}\sim\chi^2_{r, \delta}\text{ where }\delta=\frac{\utilde{\theta}^tP\utilde{\theta}}{\sigma^2}

Proof: An×n=(a1~,,an~)\exist A_{n\times n}=(\utilde{a_1}, \cdots, \utilde{a_n}) is orthogonal matrix s.t.

AtPA=diag(1,1,,1r,0,0,,0nr)=[Ir000]\begin{align*} A^tPA&=\text{diag}(\underbrace{1,1,\cdots,1}_{r},\underbrace{0,0,\cdots,0}_{n-r})\\ &=\begin{bmatrix*} I_r&0\\ 0&0 \end{bmatrix*} \end{align*}     P=A[Ir000]At=(a1~,,ar~)(a1~tar~t)=ArArt(1)\implies P=A\begin{bmatrix*} I_r&0\\ 0&0 \end{bmatrix*}A^t=(\utilde{a_1}, \cdots, \utilde{a_r})\begin{pmatrix} \utilde{a_1}^t\\ \vdots\\ \utilde{a_r}^t\\ \end{pmatrix}=A_rA_r^t\tag{$\vartriangle_1$}

Let W~=AtY~    AW~=Y~\utilde{W}=A^t\utilde{Y}\iff A\utilde{W}=\utilde{Y}

    Y~tPY~=(AW~)tP(AW~)=W~tAtPAW~=W~t[Ir000]W~=i=1rWi2(2)\implies \utilde{Y}^tP\utilde{Y}=(A\utilde{W})^tP(A\utilde{W})=\utilde{W}^tA^tPA\utilde{W}=\utilde{W}^t\begin{bmatrix*} I_r&0\\ 0&0 \end{bmatrix*}\utilde{W}=\sum_{i=1}^rW_i^2\tag{$\vartriangle_2$}

Y~Nn(θ~,σ2I)\because\utilde{Y}\sim N_n(\utilde{\theta}, \sigma^2I)

W~=AtY~Nn(Atθ~,σ2I)\therefore\utilde{W}=A^t\utilde{Y}\sim N_n(A^t\utilde{\theta}, \sigma^2I), note σ2{AtY~}=σ2AIAt=σ2I\sigma^2\set{A^t\utilde{Y}}=\sigma^2AIA^t=\sigma^2I

Wi,i=1,,nW_i, i=1,\cdots, n are independent

    i=1rWi2σ2χr,δ2 with δ=1σ2i=1r(ai~1×ntθ~n×1)2=θ~tArArtθ~σ2=1byθ~tPθ~σ2\implies \sum_{i=1}^r\frac{W_i^2}{\sigma^2}\sim\chi^2_{r, \delta} \text{ with }\delta =\frac{1}{\sigma^2}\sum_{i=1}^r(\utilde{a_i}_{1\times n}^t\utilde{\theta}_{n\times 1})^2=\frac{\utilde{\theta}^tA_rA_r^t\utilde{\theta}}{\sigma^2}\xlongequal[\vartriangle_1]{\text{by}}\frac{\utilde{\theta}^tP\utilde{\theta}}{\sigma^2} by 21σ2i=1rWi2=Y~tPY~σ2χr,δ2 where δ=θ~tPθ~σ2=E[Y~]tPE[Y~]σ2\xRightarrow{\text{by }\vartriangle_2}\frac{1}{\sigma^2}\sum_{i=1}^rW_i^2=\frac{\utilde{Y}^tP\utilde{Y}}{\sigma^2}\sim\chi^2_{r, \delta}\text{ where }\delta=\frac{\utilde{\theta}^tP\utilde{\theta}}{\sigma^2}=\frac{E[\utilde{Y}]^tPE[\utilde{Y}]}{\sigma^2}
    SSTOσ2=Y~t(IJn)Y~σ2χn,δ2 with δ=(θiθˉ)tσ2SSRσ2=Y~t(HJn)Y~σ2χp1,δ2 with δ=θ~t(HJn)θ~σ2SSEσ2=Y~t(IH)Y~σ2χnr,δ2 with δ=θ~t(IH)θ~σ2\begin{align*} \implies& \frac{\text{SSTO}}{\sigma^2}=\frac{\utilde{Y}^t(I-\frac{J}{n})\utilde{Y}}{\sigma^2}\sim\chi^2_{n, \delta}\text{ with } \delta=\frac{\sum(\theta_i-\bar{\theta})^t}{\sigma^2}\\ &\frac{\text{SSR}}{\sigma^2}=\frac{\utilde{Y}^t(H-\frac{J}{n})\utilde{Y}}{\sigma^2}\sim\chi^2_{p-1, \delta}\text{ with }\delta=\frac{\utilde{\theta}^t(H-\frac{J}{n})\utilde{\theta}}{\sigma^2}\\ &\frac{\text{SSE}}{\sigma^2}=\frac{\utilde{Y}^t(I-H)\utilde{Y}}{\sigma^2}\sim\chi^2_{n-r, \delta} \text{ with }\delta = \frac{\utilde{\theta}^t(I-H)\utilde{\theta}}{\sigma^2} \end{align*}

我們知道互相獨立的 χ2\chi^2 分配的和也是 χ2\chi^2 分配。對於 non-central χ2\chi^2 分配,我們也可以得到類似的結論。相加得到的 χ2\chi^2 分佈的兩個參數等於原本兩個分佈的參數的和。

這裡我們發現 SSTO, SSR, SSE 都是 χ2\chi^2 分配,SSTO 的兩個參數也剛好是 SSR 和 SSE 的參數的和。那麼他們真的是獨立的嗎?

Theorem

Cochran's Theorem:

Y~Nn(θ~,σ2I)\utilde{Y}\sim N_n(\utilde{\theta},\sigma^2 I)

Let Pj,j=1,2,,m:n×nP_j, j=1,2,\cdots, m: n\times n symmetric matrix of rank rjr_j s.t. I=j=1mPjI=\sum_{j=1^m}P_j

Hence Y~tIY~=j=1mY~tPjY~\utilde{Y}^tI\utilde{Y}=\sum_{j=1}^m\utilde{Y}^tP_j\utilde{Y}

    j=1,2,,mY~tPY~σ2χrj,δj2 with δj=θ~tPjθ~σ2 are independent    j=1mrj=n\implies j=1,2,\cdots,m \quad\frac{\utilde{Y}^tP\utilde{Y}}{\sigma^2}\sim\chi^2_{r_j,\delta_j}\text{ with }\delta_j=\frac{\utilde{\theta}^tP_j\utilde{\theta}}{\sigma^2} \text{ are independent}\iff \sum_{j=1}^m r_j=n

Note:

If (DtD)1(D^tD)^{-1} exists, then rank(D)=p=dim(Ω)=tr(H)\text{rank}(D)=p=\dim(\Omega)=\text{tr}(H)

  1. rank:

    npnpI=H+(IH)\begin{alignat*}{3} &n &p& &n-p\\ &I=&H&+&(I-H) \end{alignat*}     Y~tHY~σ2=Y^2σ2χp,θ~tHθ~σ22=χp,θ2σ22Y~tMY~σ2=SSEσ2χnp,θ~tMθ~σ2=χnp2\implies \begin{align*} &\frac{\utilde{Y}^tH\utilde{Y}}{\sigma^2}=\frac{||\hat{Y}||^2}{\sigma^2}\sim\chi^2_{p,\frac{\utilde{\theta}^tH\utilde{\theta}}{\sigma^2}}=\chi^2_{p,\frac{||\theta||^2}{\sigma^2}}\\ &\frac{\utilde{Y}^tM\utilde{Y}}{\sigma^2}=\frac{\text{SSE}}{\sigma^2}\sim\chi_{n-p,\frac{\utilde{\theta}^tM\utilde{\theta}}{\sigma^2}}=\chi^2_{n-p} \end{align*}
  2. rank:

    n=npp11I=(IH)+(HJn)+Jn\begin{alignat*}{4} &n&=&n-p&&p-1&&1\\ &I&=&(I-H)&+&(H-\frac{J}{n})&+&\frac{J}{n} \end{alignat*}     Y~t(IH)Y~σ2=SSEσ2χnp2θ~t(IH)θ~=0θ~ΩY~t(HJn)Y~σ2=SSRσ2χp1,θ~t(HJn)θ~σ2=(θiθˉ)2σ2=θ2nθˉ2σ22Y~tJnY~σ2=nYˉ2σ2χ1,θ~tJnθ~σ2=nθˉ2σ22\implies\begin{alignat*}{2} &\frac{\utilde{Y}^t(I-H)\utilde{Y}}{\sigma^2}&=&\frac{\text{SSE}}{\sigma^2}\sim\chi^2_{n-p}\quad\because\utilde{\theta}^t(I-H)\utilde{\theta}=0 \because \utilde{\theta}\in\Omega\\ &\frac{\utilde{Y}^t(H-\frac{J}{n})\utilde{Y}}{\sigma^2}&=&\frac{\text{SSR}}{\sigma^2}\sim\chi^2_{p-1,\frac{\utilde{\theta}^t(H-\frac{J}{n})\utilde{\theta}}{\sigma^2}=\frac{\sum(\theta_i-\bar{\theta})^2}{\sigma^2}=\frac{||\theta||^2-n\bar{\theta}^2}{\sigma^2}}\\ &\frac{\utilde{Y}^t\frac{J}{n}\utilde{Y}}{\sigma^2}&=&\frac{n\bar{Y}^2}{\sigma^2}\sim\chi^2_{1, \frac{\utilde{\theta}^t\frac{J}{n}\utilde{\theta}}{\sigma^2}=\frac{n\bar{\theta}^2}{\sigma^2}} \end{alignat*}

Remark:

Y~n×1=Dn×pβ~p×1+ε~n×1where β~=(β0,β1,,βk)t,p=k+1<n,ε~Nn(0~,σ2I)\utilde{Y}_{n\times 1}=D_{n\times p}\utilde{\beta}_{p\times 1}+\utilde{\varepsilon}_{n\times 1}\quad\text{where }\utilde{\beta}=(\beta_0, \beta_1, \cdots, \beta_k)^t,p=k+1<n,\utilde{\varepsilon}\sim N_n(\utilde{0}, \sigma^2I)
  • E[Y~]=Dβ~θ~ΩE[\utilde{Y}]=D\utilde{\beta}\triangleq\utilde{\theta}\in\Omega, where Ω=\Omega= column space of DD.

If DD is full rank     rank(D)=p=dim(Ω)    (DtD)p×p1\iff \text{rank}(D)=p=\dim(\Omega)\iff (D^tD)^{-1}_{p\times p} exists

  • b~=(DtD)1DtY\utilde{b}=(D^tD)^{-1}D^tY
  • Y^~=Db~=D(DtD)1DtY~=HY~\utilde{\hat{Y}}=D\utilde{b}=D(D^tD)^{-1}D^t\utilde{Y}=H\utilde{Y}
SSRσ2=Y~t(HJnY~)σ2χp1,θ~t(HJn)θ~σ2=(θiθˉ)2σ22SSEσ2=Y~t(IH)Y~σ2χnp,θ~t(IH)θ~σ2=02\begin{align*} &\frac{\text{SSR}}{\sigma^2}=\frac{\utilde{Y}^t(H-\frac{J}{n}\utilde{Y})}{\sigma^2}\sim\chi^2_{p-1, \frac{\utilde{\theta}^t(H-\frac{J}{n})\utilde{\theta}}{\sigma^2}=\frac{\sum(\theta_i-\bar{\theta})^2}{\sigma^2}}\\ &\frac{\text{SSE}}{\sigma^2}=\frac{\utilde{Y}^t(I-H)\utilde{Y}}{\sigma^2}\sim\chi^2_{n-p, \frac{\utilde{\theta}^t(I-H)\utilde{\theta}}{\sigma^2}=0}\\ \end{align*}     SSTOσ2=SSRσ2+SSEσ2χn1,(θiθˉ)2σ2=θ~2nθˉ2σ22\implies \frac{\text{SSTO}}{\sigma^2}=\frac{\text{SSR}}{\sigma^2}+\frac{\text{SSE}}{\sigma^2}\sim\chi^2_{n-1,\frac{(\theta_i-\bar{\theta})^2}{\sigma^2}=\frac{||\utilde{\theta}||^2-n\bar{\theta}^2}{\sigma^2}}

Let MS SSdf\triangleq\frac{\text{SS}}{\text{df}} e.g. MSE=SSEnp\frac{\text{SSE}}{n-p}, MSR=SSRp1\frac{\text{SSR}}{p-1}

  • E[MSE]=E[SSEnp]=σ2npE[SSEσ2χnp2]=σ2np(np)=σ2E[\text{MSE}]=E[\frac{\text{SSE}}{n-p}]=\frac{\sigma^2}{n-p}E[\underbrace{\frac{\text{SSE}}{\sigma^2}}_{\sim\chi^2_{n-p}}]=\frac{\sigma^2}{n-p}(n-p)=\sigma^2
  • E[MSR]=σ2p1E[SSRσ2]SSRσ2χp1,(θiθˉ)2σ22=σ2p1(p1+(θiθˉ)2σ2)=σ2+1p1(θiθˉ)20\begin{align*} E[\text{MSR}]&=\frac{\sigma^2}{p-1}E[\frac{\text{SSR}}{\sigma^2}]\quad \frac{\text{SSR}}{\sigma^2}\sim\chi^2_{p-1,\frac{\sum(\theta_i-\bar{\theta})^2}{\sigma^2}}\\ &=\frac{\sigma^2}{p-1}(p-1+\frac{\sum(\theta_i-\bar{\theta})^2}{\sigma^2})\\ &=\sigma^2+\frac{1}{p-1}\sum(\theta_i-\bar{\theta})^2\ge 0 \end{align*}

Note:

E[MSR]=σ2+1p1(θiθˉ)20"=" holds     θi=θˉiE[\text{MSR}]=\sigma^2+\frac{1}{p-1}\sum(\theta_i-\bar{\theta})^2\ge 0\quad "="\text{ holds }\iff\theta_i=\bar{\theta}\forall i Hence E[MSE]=σ2σ2+1p1(θiθˉ)2=E[MSR]\text{Hence } E[\text{MSE}]=\sigma^2\le\sigma^2+\frac{1}{p-1}\sum(\theta_i-\bar{\theta})^2=E[\text{MSR}]

with "=" hold     β1=β2==βk=0\iff \beta_1=\beta_2=\cdots=\beta_k=0, i.e. Y~\utilde{Y}X~\utilde{X} 之間沒有線性關係     Yi=β0+εi\iff Y_i=\beta_0+\varepsilon_i

因為我們通常會關注解釋變數和反應變數之間是否真的有線性關係,所以我們會做假設檢定:

H0:β1=β2==βk=0 v.s. H1:βj0 for some j=1,2,,kH_0:\beta_1=\beta_2=\cdots=\beta_k=0\quad\text{ v.s. }\quad H_1:\beta_j\ne 0\text{ for some }j=1,2,\cdots,k

注:β0\beta_0 通常不會被檢定,因為他是截距項,通常我們只會關注解釋變數和反應變數之間的關係。

  • Under H0:β1=β2==βk=0H_0:\beta_1=\beta_2=\cdots=\beta_k=0

    E[MSE]=σ2=E[MSR]    MSRMSE 傾向接近 1E[\text{MSE}]=\sigma^2=E[\text{MSR}]\implies \frac{\text{MSR}}{\text{MSE}}\text{ 傾向接近 }1
  • Under H1:βj0 for some j=1,2,,kH_1:\beta_j\ne 0\text{ for some }j=1,2,\cdots,k

    E[MSE]=σ2<E[MSR]    MSRMSE 傾向大於 1E[\text{MSE}]=\sigma^2<E[\text{MSR}]\implies \frac{\text{MSR}}{\text{MSE}}\text{ 傾向大於 }1
Definition

Test Statistic:

FMSRMSEF^*\triangleq\frac{\text{MSR}}{\text{MSE}}

reject H0    F>cH_0\iff F^*>c where cc is a critical value s.t. P(F>cH0)=αP(F^*>c|H_0)=\alpha

Note: Under H0:β1=β2==βk=0H_0:\beta_1=\beta_2=\cdots=\beta_k=0

<SSRσ2χp1,(θiθˉ)2σ2=H002SSEσ2χnp2\perp\left<\begin{align*} &\frac{\text{SSR}}{\sigma^2}\sim\chi^2_{p-1,\frac{\sum(\theta_i-\bar{\theta})^2}{\sigma^2}\xlongequal{H_0}0}\\ &\frac{\text{SSE}}{\sigma^2}\sim\chi^2_{n-p} \end{align*} \right.     F=MSRMSE=SSR/p1SSER/np=1p1SSRσ2χp121npSSEσ2χnp2>=dχp12/p1χnp2/np>=dFp1,np\implies F^*=\frac{\text{MSR}}{\text{MSE}}=\frac{\text{SSR}/p-1}{\text{SSER}/n-p}=\frac{\frac{1}{p-1}\overbrace{\frac{\text{SSR}}{\sigma^2}}^{\chi^2_{p-1}}}{\frac{1}{n-p}\underbrace{\frac{\text{SSE}}{\sigma^2}}_{\chi^2_{n-p}}}\Bigg>\perp \overset{\text{d}}{=}\frac{\chi^2_{p-1}/p-1}{\chi^2_{n-p}/n-p}\Big>\perp\overset{\text{d}}{=}F_{p-1,n-p}

If DD is full rank, to test if there is a linear relationship between X~\utilde{X} and Y~\utilde{Y}, i.e.

H0:β1=β2==βk=0 v.s. H1:βj0 for some j=1,2,,kH_0:\beta_1=\beta_2=\cdots=\beta_k=0\quad\text{ v.s. }\quad H_1:\beta_j\ne 0\text{ for some }j=1,2,\cdots,k

We reject H0H_0 at level α\alpha if F>Fp1,np,αF^*>F_{p-1,n-p,\alpha}

而我們得到數據後,計算出 f=f^*= MSR/MSE,並且得到 P-value =PH0(Fp1,np>f)=P_{H_0}(F_{p-1,n-p}>f^*)。如果 P-value 小於顯著水平 α\alpha,則拒絕虛無假設。

ANOVA Table

SourceSSdfMSFP-value
RegressionSSRp-1MSRMSR/MSE=ff^*P(Fp1,np>f)P(F_{p-1,n-p}>f^*)
ErrorSSEn-pMSE
TotalSSTOn-1

其中的假設檢定為:Y~\utilde{Y}X~\utilde{X} 之間是否有線性關係

H0:β1=β2==βk=0 v.s. H1:βj0 for some j=1,2,,kH_0:\beta_1=\beta_2=\cdots=\beta_k=0\quad\text{ v.s. }\quad H_1:\beta_j\ne 0\text{ for some }j=1,2,\cdots,k

Remark:在 R 語言中,如果 fm 是我們的迴歸模型,我們可以使用 anova(fm) 來得到 ANOVA 表。

SourceSSdf
perdict1SSR11
perdict2SSR21
\vdots\vdots\vdots
perdict(p-1)SSR(p-1)1
ErrorSSEn-p

也就是說,R 會先對一個解釋變數做投影,扣除這個變數的影響後,再對下一個變數做投影,以此類推得到所有的 SSR。但因為解釋變數之間不一定是垂直的,所以投影的順序可能會影響 SSR 的大小。

因此在構建模型時,會把我們認為比較重要的變數放在前面,這樣可以更好的解釋模型。比如 lm(Y~X1+X2+X3),我們認為 X1 對 Y 的影響最大,X2 次之,X3 最小,所以我們把 X1 放在前面。