Holder Inequality
Let a > 0 , b > 0 a>0, b>0 a > 0 , b > 0 and p > 1 , q > 1 p>1, q>1 p > 1 , q > 1 , where 1 p + 1 q = 1 \frac{1}{p}+\frac{1}{q}=1 p 1 + q 1 = 1
⟹ 1 p a p + 1 q b q ≥ a b \implies \frac{1}{p}a^p+\frac{1}{q}b^q \ge ab ⟹ p 1 a p + q 1 b q ≥ ab , where equality holds ⟺ a p = b q \iff a^p=b^q ⟺ a p = b q
X , Y X, Y X , Y are r.v.'s, take
a = ∣ X ∣ [ E ( ∣ X ∣ p ) ] 1 / p , b = ∣ Y ∣ [ E ( ∣ Y ∣ q ) ] 1 / q in the lemma a=\frac{|X|}{[E(|X|^p)]^{1/p}},\qquad b=\frac{|Y|}{[E(|Y|^q)]^{1/q}} \quad \text{in the lemma} a = [ E ( ∣ X ∣ p ) ] 1/ p ∣ X ∣ , b = [ E ( ∣ Y ∣ q ) ] 1/ q ∣ Y ∣ in the lemma
⟹ 1 p ( ∣ X ∣ [ E ( ∣ X ∣ p ) ] 1 / p ) p + 1 q ( ∣ Y ∣ [ E ( ∣ Y ∣ q ) ] 1 / q ) q ≥ ∣ X ∣ [ E ( ∣ X ∣ p ) ] 1 / p ∣ Y ∣ [ E ( ∣ Y ∣ q ) ] 1 / q ⇒ both side expectation 1 = 1 p + 1 q ≥ E ∣ X Y ∣ [ E ∣ X ∣ p ] 1 / p [ E ∣ Y ∣ q ] 1 / q \begin{align*}
\implies & \frac{1}{p}\left(\frac{|X|}{[E(|X|^p)]^{1/p}}\right)^p+\frac{1}{q}\left(\frac{|Y|}{[E(|Y|^q)]^{1/q}}\right)^q \ge \frac{|X|}{[E(|X|^p)]^{1/p}}\frac{|Y|}{[E(|Y|^q)]^{1/q}} \\
\xRightarrow[\text{both side}]{\text{expectation}}& 1 =\frac{1}{p}+\frac{1}{q}\ge \frac{E|XY|}{[E|X|^p]^{1/p}[E|Y|^q]^{1/q}} \\
\end{align*} ⟹ expectation both side p 1 ( [ E ( ∣ X ∣ p ) ] 1/ p ∣ X ∣ ) p + q 1 ( [ E ( ∣ Y ∣ q ) ] 1/ q ∣ Y ∣ ) q ≥ [ E ( ∣ X ∣ p ) ] 1/ p ∣ X ∣ [ E ( ∣ Y ∣ q ) ] 1/ q ∣ Y ∣ 1 = p 1 + q 1 ≥ [ E ∣ X ∣ p ] 1/ p [ E ∣ Y ∣ q ] 1/ q E ∣ X Y ∣
Holder's Inequality
p > 1 , q > 1 p>1, q>1 p > 1 , q > 1 where 1 p + 1 q = 1 \frac{1}{p}+\frac{1}{q}=1 p 1 + q 1 = 1
E [ ∣ X Y ∣ ] ≤ [ E ∣ X ∣ p ] 1 p [ E ∣ Y ∣ q ] 1 q E[|XY|] \le [E|X|^p]^{\frac{1}{p}}[E|Y|^q]^{\frac{1}{q}} E [ ∣ X Y ∣ ] ≤ [ E ∣ X ∣ p ] p 1 [ E ∣ Y ∣ q ] q 1 where equality holds ⟺ P ( ∣ X ∣ p E [ ∣ X ∣ p ] = ∣ Y ∣ q E [ ∣ Y ∣ q ] ) = 1 \iff P(\frac{|X|^p}{E[|X|^p]}=\frac{|Y|^q}{E[|Y|^q]})=1 ⟺ P ( E [ ∣ X ∣ p ] ∣ X ∣ p = E [ ∣ Y ∣ q ] ∣ Y ∣ q ) = 1 (almost surely)
Y = 1 Y=1 Y = 1 in Holder's Inequality, we get
E [ ∣ X ∣ ] ≤ [ E ∣ X ∣ p ] 1 p ⇒ ∣ X ∣ → ∣ X ∣ r E [ ∣ X ∣ r ] ≤ [ E ∣ X ∣ p r ] 1 p = [ E ( ∣ X ∣ s ) ] r s , s ≥ r ⟹ [ E ( ∣ X ∣ r ) ] 1 r ≤ [ E ( ∣ X ∣ s ) ] 1 s , s ≥ r ⟹ g ( r ) ≜ [ E ( ∣ X ∣ r ) ] 1 r is monotonically increasing \begin{align*}
& E[|X|] \le [E|X|^p]^{\frac{1}{p}} \\
\xRightarrow{|X|\to |X|^r} & E[|X|^r] \le [E|X|^{pr}]^{\frac{1}{p}} = [E(|X|^s)]^\frac{r}{s},\quad s\ge r\\
\implies & [E(|X|^r)]^\frac{1}{r}\le [E(|X|^s)]^\frac{1}{s},\quad s\ge r\\
\implies & g(r) \triangleq [E(|X|^r)]^\frac{1}{r} \text{ is monotonically increasing}
\end{align*} ∣ X ∣ → ∣ X ∣ r ⟹ ⟹ E [ ∣ X ∣ ] ≤ [ E ∣ X ∣ p ] p 1 E [ ∣ X ∣ r ] ≤ [ E ∣ X ∣ p r ] p 1 = [ E ( ∣ X ∣ s ) ] s r , s ≥ r [ E ( ∣ X ∣ r ) ] r 1 ≤ [ E ( ∣ X ∣ s ) ] s 1 , s ≥ r g ( r ) ≜ [ E ( ∣ X ∣ r ) ] r 1 is monotonically increasing
因此,高次動差存在,可以保證低次動差存在。
Cauchy-Schwarz Inequality
Let p = q = 2 p=q=2 p = q = 2 , in Holder's Inequality, we get
[ E ( ∣ X Y ∣ ) ] 2 ≤ E [ X 2 ] E [ Y 2 ] [E(|XY|)]^2\le E[X^2]E[Y^2] [ E ( ∣ X Y ∣ ) ] 2 ≤ E [ X 2 ] E [ Y 2 ]
Let X → X − E [ X ] , Y → Y − E [ Y ] X\to X-E[X], Y\to Y-E[Y] X → X − E [ X ] , Y → Y − E [ Y ]
∣ E [ ( X − E [ X ] ) ( Y − E [ Y ] ) ] ∣ ≤ E [ ∣ ( X − E [ X ] ) ( Y − E [ Y ] ) ∣ ] ≤ [ E ( X − E [ X ] ) 2 ] 1 2 [ E ( Y − E [ Y ] ) 2 ] 1 2 = σ 2 ( X ) σ 2 ( Y ) = σ ( X ) σ ( Y ) \begin{align*}
|E[(X-E[X])(Y-E[Y])]| &\le E[|(X-E[X])(Y-E[Y])|]\\
&\le [E(X-E[X])^2]^\frac{1}{2}[E(Y-E[Y])^2]^\frac{1}{2}\\
&= \sqrt{\sigma^2(X)\sigma^2(Y)}\\
&= \sigma(X)\sigma(Y)
\end{align*} ∣ E [( X − E [ X ]) ( Y − E [ Y ])] ∣ ≤ E [ ∣ ( X − E [ X ]) ( Y − E [ Y ]) ∣ ] ≤ [ E ( X − E [ X ] ) 2 ] 2 1 [ E ( Y − E [ Y ] ) 2 ] 2 1 = σ 2 ( X ) σ 2 ( Y ) = σ ( X ) σ ( Y )
i.e. ∣ C o v ( X , y ) ∣ ≤ σ ( X ) σ Y ⟺ ∣ ρ X , Y ∣ ≤ 1 |Cov(X,y)|\le\sigma(X)\sigma{Y}\iff|\rho_{X,Y}|\le 1 ∣ C o v ( X , y ) ∣ ≤ σ ( X ) σ Y ⟺ ∣ ρ X , Y ∣ ≤ 1
Minkowski's Inequality
[ E ∣ X + Y ∣ p ] 1 p ≤ [ E ∣ X ∣ p ] 1 p + [ E ∣ Y ∣ p ] 1 p , p ≥ 1 [E|X+Y|^p]^\frac{1}{p}\le [E|X|^p]^\frac{1}{p}+[E|Y|^p]^\frac{1}{p},\quad p\ge 1 [ E ∣ X + Y ∣ p ] p 1 ≤ [ E ∣ X ∣ p ] p 1 + [ E ∣ Y ∣ p ] p 1 , p ≥ 1
Jensen's Inequality
Jensen's Inequality
Any r.v. X, if g g g is a convex function, then E [ g ( X ) ] ≥ g ( E [ X ] ) E[g(X)]\ge g(E[X]) E [ g ( X )] ≥ g ( E [ X ])
Equality holds ⟺ P ( g ( x ) = a + b x ) = 1 \iff P(g(x)=a+bx)=1 ⟺ P ( g ( x ) = a + b x ) = 1
Proof : Let l ( x ) l(x) l ( x ) be the tangent line to the graph of g ( x ) g(x) g ( x ) at the point ( E [ X ] , g ( E [ X ] ) ) (E[X], g(E[X])) ( E [ X ] , g ( E [ X ])) . Note that E [ X ] E[X] E [ X ] is a constant
i.e. l ( x ) = a + b x l(x)=a+bx l ( x ) = a + b x , s.t. l ( E [ X ] ) = a + b E [ X ] = g ( E [ X ] ) l(E[X])=a+bE[X]=g(E[X]) l ( E [ X ]) = a + b E [ X ] = g ( E [ X ]) and l ( x ) ≤ g ( x ) , ∀ x l(x)\le g(x), \forall x l ( x ) ≤ g ( x ) , ∀ x , since g ( x ) g(x) g ( x ) is convex.
∵ l ( x ) ≤ g ( x ) ∴ g ( E [ X ] ) = E [ l ( X ) ] ≤ E [ g ( X ) ] \begin{align*}
\because & l(x)\le g(x)\\
\therefore & g(E[X])=E[l(X)]\le E[g(X)]\\
\end{align*} ∵ ∴ l ( x ) ≤ g ( x ) g ( E [ X ]) = E [ l ( X )] ≤ E [ g ( X )]
柴比雪夫不等式(Tchebycheff's Inequality)
σ 2 = E [ ( X − μ ) 2 ] = E [ ( X − μ ) 2 I ( ∣ X − μ ∣ ≥ ε ) ] + E [ ( X − μ ) 2 I ( ∣ X − μ ∣ < ε ) ] ≥ E [ ( X − μ ) 2 I ( ∣ X − μ ∣ ≥ ε ) ] ≥ ε 2 E [ I ( ∣ X − μ ∣ ≥ ε ) ] ∵ ( X − μ ) 2 ≥ ε 2 when ∣ X − μ ∣ ≥ ε = ε 2 P ( ∣ X − μ ∣ ≥ ε ) ⟹ P ( ∣ X − μ ∣ ≥ ε ) ≤ σ 2 ε 2 \begin{align*}
\sigma^2 =& E[(X-\mu)^2]\\
=& E[(X-\mu)^2I(|X-\mu|\ge\varepsilon)]+E[(X-\mu)^2I(|X-\mu|<\varepsilon)]\\
\ge& E[(X-\mu)^2I(|X-\mu|\ge\varepsilon)]\\
\ge& \varepsilon^2E[I(|X-\mu|\ge\varepsilon)] \quad \because (X-\mu)^2\ge\varepsilon^2 \text{ when } |X-\mu|\ge\varepsilon\\
=& \varepsilon^2P(|X-\mu|\ge\varepsilon)\\
\implies& P(|X-\mu|\ge\varepsilon)\le\frac{\sigma^2}{\varepsilon^2}
\end{align*} σ 2 = = ≥ ≥ = ⟹ E [( X − μ ) 2 ] E [( X − μ ) 2 I ( ∣ X − μ ∣ ≥ ε )] + E [( X − μ ) 2 I ( ∣ X − μ ∣ < ε )] E [( X − μ ) 2 I ( ∣ X − μ ∣ ≥ ε )] ε 2 E [ I ( ∣ X − μ ∣ ≥ ε )] ∵ ( X − μ ) 2 ≥ ε 2 when ∣ X − μ ∣ ≥ ε ε 2 P ( ∣ X − μ ∣ ≥ ε ) P ( ∣ X − μ ∣ ≥ ε ) ≤ ε 2 σ 2
Tchebycheff's Inequality
Let X X X be a r.v. with E [ X ] = μ , 0 ≤ σ 2 = V a r ( X ) E[X]=\mu, 0\le\sigma^2=Var(X) E [ X ] = μ , 0 ≤ σ 2 = Va r ( X )
⟹ P ( ∣ X − μ ∣ ≥ ε ) ≤ σ 2 ε 2 \implies P(|X-\mu|\ge\varepsilon)\le\frac{\sigma^2}{\varepsilon^2} ⟹ P ( ∣ X − μ ∣ ≥ ε ) ≤ ε 2 σ 2
Take ε = k σ > 0 ⟹ P ( ∣ X − μ ∣ ≥ k σ ) ≤ σ 2 k 2 σ 2 = 1 k 2 ⟺ P ( ∣ X − μ ∣ ≤ k σ ) ≥ 1 − 1 k 2 \varepsilon=k\sigma>0\implies P(|X-\mu|\ge k\sigma)\le\frac{\sigma^2}{k^2\sigma^2}=\frac{1}{k^2} \iff P(|X-\mu|\le k\sigma)\ge 1- \frac{1}{k^2} ε = kσ > 0 ⟹ P ( ∣ X − μ ∣ ≥ kσ ) ≤ k 2 σ 2 σ 2 = k 2 1 ⟺ P ( ∣ X − μ ∣ ≤ kσ ) ≥ 1 − k 2 1
For X ∼ iid N ( μ , σ 2 ) X\stackrel{\text{iid}}{\sim} N(\mu, \sigma^2) X ∼ iid N ( μ , σ 2 )
P ( ∣ X − μ ∣ ≤ 2 σ ) = P ( ∣ X − μ ∣ σ ≤ 2 ) = P ( ∣ Z ∣ ≤ 2 ) = 0.9545 ≥ 0.75 P(|X-\mu|\le 2\sigma)=P(\frac{|X-\mu|}{\sigma}\le 2)=P(|Z|\le 2)=0.9545\ge 0.75 P ( ∣ X − μ ∣ ≤ 2 σ ) = P ( σ ∣ X − μ ∣ ≤ 2 ) = P ( ∣ Z ∣ ≤ 2 ) = 0.9545 ≥ 0.75
σ 2 ≜ V a r ( X ) = 0 ⟹ P ( X = μ ) = 1 \sigma^2\triangleq Var(X)=0 \implies P(X=\mu)=1 σ 2 ≜ Va r ( X ) = 0 ⟹ P ( X = μ ) = 1
Proof :
∀ ε > 0 , P ( ∣ X − μ ∣ ≥ ε ) ≤ σ 2 ε 2 = 0 ∵ σ 2 = 0 ⟺ ∀ ε > 0 , P ( ∣ X − μ ∣ < ε ) = 1 ⟹ ∀ n = 1 , 2 , ⋯ , P ( ∣ X − μ ∣ < 1 n ) = 1 \begin{align*}
& \forall \varepsilon>0, P(|X-\mu|\ge\varepsilon)\le\frac{\sigma^2}{\varepsilon^2}=0 \qquad \because \sigma^2=0\\
\iff &\forall \varepsilon>0, P(|X-\mu|<\varepsilon)=1\\
\implies &\forall n=1,2,\cdots, P(|X-\mu|<\frac{1}{n})=1\\
\end{align*} ⟺ ⟹ ∀ ε > 0 , P ( ∣ X − μ ∣ ≥ ε ) ≤ ε 2 σ 2 = 0 ∵ σ 2 = 0 ∀ ε > 0 , P ( ∣ X − μ ∣ < ε ) = 1 ∀ n = 1 , 2 , ⋯ , P ( ∣ X − μ ∣ < n 1 ) = 1
Let A n ≜ { ∣ X − μ ∣ < 1 n } A_n\triangleq\set{|X-\mu|<\frac{1}{n}} A n ≜ { ∣ X − μ ∣ < n 1 } , then A 1 ⊇ A 2 ⊇ ⋯ A_1\supseteq A_2\supseteq\cdots A 1 ⊇ A 2 ⊇ ⋯ , and lim n → ∞ A n = ⋂ n = 1 ∞ A n = { ∣ X − μ ∣ = 0 } \lim_{n\to\infty}A_n=\bigcap_{n=1}^\infty A_n=\set{|X-\mu|=0} lim n → ∞ A n = ⋂ n = 1 ∞ A n = { ∣ X − μ ∣ = 0 }
⟹ 1 = P ( lim n → ∞ A n ) = P ( ⋂ n = 1 ∞ A n ) = P ( ∣ X − μ ∣ = 0 ) = 1 \implies 1=P(\lim_{n\to\infty}A_n)=P(\bigcap_{n=1}^\infty A_n)=P(|X-\mu|=0)=1 ⟹ 1 = P ( n → ∞ lim A n ) = P ( n = 1 ⋂ ∞ A n ) = P ( ∣ X − μ ∣ = 0 ) = 1
雖然柴比雪夫不等式給出的上限或下限很寬泛,但它無法被進一步改進, 因為有例子可以觸碰到上下限。
Give k > 0 k>0 k > 0 , let X X X where
P ( X + 0 ) = 1 − 1 k 2 P ( X = 1 ) = 1 2 k 2 P ( X = − 1 ) = 1 2 k 2 \begin{align*}
& P(X+0)=1-\frac{1}{k^2}\\
& P(X=1)=\frac{1}{2k^2}\\
& P(X=-1)=\frac{1}{2k^2}
\end{align*} P ( X + 0 ) = 1 − k 2 1 P ( X = 1 ) = 2 k 2 1 P ( X = − 1 ) = 2 k 2 1
⟹ μ = E [ X ] = 0 ⋅ P ( X = 0 ) + 1 ⋅ P ( X = 1 ) + ( − 1 ) ⋅ P ( X = − 1 ) = 0 σ 2 = E [ X 2 ] − E [ X ] 2 = E [ X 2 ] = 0 2 ⋅ P ( X = 0 ) + 1 2 ⋅ P ( X = 1 ) + ( − 1 ) 2 ⋅ P ( X = − 1 ) = 1 k 2 ⟹ P ( ∣ X − μ ∣ ≥ k σ ) = P ( ∣ X ∣ ≥ 1 ) ∵ μ = 0 , σ = 1 k = P ( X = 1 ) + P ( X = − 1 ) = 1 k 2 \begin{align*}
&\begin{align*}
\implies \mu=E[X]&=0\cdot P(X=0)+1\cdot P(X=1)+(-1)\cdot P(X=-1)\\
&=0
\end{align*}\\
&\qquad \begin{align*}
\sigma^2=&E[X^2]-E[X]^2=E[X^2]\\
=&0^2\cdot P(X=0)+1^2\cdot P(X=1)+(-1)^2\cdot P(X=-1)\\
=&\frac{1}{k^2}
\end{align*}\\
&\begin{align*}
\implies & P(|X-\mu|\ge k\sigma) \\
&= P(|X|\ge 1) \qquad \because \mu=0, \sigma=\frac{1}{k}\\
&=P(X=1)+P(X=-1)\\
&=\frac{1}{k^2}
\end{align*}
\end{align*} ⟹ μ = E [ X ] = 0 ⋅ P ( X = 0 ) + 1 ⋅ P ( X = 1 ) + ( − 1 ) ⋅ P ( X = − 1 ) = 0 σ 2 = = = E [ X 2 ] − E [ X ] 2 = E [ X 2 ] 0 2 ⋅ P ( X = 0 ) + 1 2 ⋅ P ( X = 1 ) + ( − 1 ) 2 ⋅ P ( X = − 1 ) k 2 1 ⟹ P ( ∣ X − μ ∣ ≥ kσ ) = P ( ∣ X ∣ ≥ 1 ) ∵ μ = 0 , σ = k 1 = P ( X = 1 ) + P ( X = − 1 ) = k 2 1
但對於特定的分佈,可以找到更接近的上下限。
e.g. Z ∼ iid N ( 0 , 1 ) Z\stackrel{\text{iid}}{\sim} N(0,1) Z ∼ iid N ( 0 , 1 ) , for k > 0 k>0 k > 0
P ( ∣ Z ∣ ≥ k ) = 2 P ( Z ≥ k ) = 2 ∫ k ∞ 1 2 π e − 1 2 z 2 d z = 2 π ∫ k ∞ e − 1 2 z 2 d z ≤ 2 π ∫ k ∞ z k e − 1 2 z 2 d z = 2 π 1 k e − k 2 2 \begin{align*}
P(|Z|\ge k) &= 2P(Z\ge k)\\
&=2 \int_k^\infty \frac{1}{\sqrt{2\pi}}e^{-\frac{1}{2}z^2}dz\\
&=\frac{\sqrt{2}}{\sqrt{\pi}}\int_k^\infty e^{-\frac{1}{2}z^2}dz\\
&\le \frac{\sqrt{2}}{\sqrt{\pi}}\int_k^\infty \frac{z}{k}e^{-\frac{1}{2}z^2}dz\\
&=\frac{\sqrt{2}}{\sqrt{\pi}}\frac{1}{k}e^{-\frac{k^2}{2}}
\end{align*} P ( ∣ Z ∣ ≥ k ) = 2 P ( Z ≥ k ) = 2 ∫ k ∞ 2 π 1 e − 2 1 z 2 d z = π 2 ∫ k ∞ e − 2 1 z 2 d z ≤ π 2 ∫ k ∞ k z e − 2 1 z 2 d z = π 2 k 1 e − 2 k 2
對於 P ( ∣ Z ∣ ≥ 2 ) P(|Z|\ge 2) P ( ∣ Z ∣ ≥ 2 ) 來說,用這個不等式計算得到 P ( ∣ Z ∣ ≥ 2 ) ≤ 0.054 P(|Z|\ge 2)\le 0.054 P ( ∣ Z ∣ ≥ 2 ) ≤ 0.054 。用柴比雪夫不等式計算得到 P ( ∣ Z ∣ ≥ 2 ) ≤ 0.25 P(|Z|\ge 2)\le 0.25 P ( ∣ Z ∣ ≥ 2 ) ≤ 0.25 。而實際上 P ( ∣ Z ∣ ≥ 2 ) = 0.0455 P(|Z|\ge 2)=0.0455 P ( ∣ Z ∣ ≥ 2 ) = 0.0455 。
我們可以推廣到更一般的情況:
Let g g g be a non-negative function, ∀ ε > 0 \forall \varepsilon>0 ∀ ε > 0
P ( g ( X ) ≥ ε ) ≤ E [ g ( X ) ] ε P(g(X)\ge\varepsilon)\le\frac{E[g(X)]}{\varepsilon} P ( g ( X ) ≥ ε ) ≤ ε E [ g ( X )]
Proof :
E [ g ( X ) ] = E [ g ( X ) I ( g ( x ) ≥ ε ) ] + E [ g ( X ) I ( g ( x ) < ε ) ] ≥ ε E [ I ( g ( x ) ≥ ε ) ] ≥ ε E [ I ( g ( x ) ≥ ε ) ] ∵ g ( X ) ≥ ε when g ( X ) ≥ ε = ε P ( g ( X ) ≥ ε ) \begin{align*}
E[g(X)] &= E[g(X)I(g(x)\ge\varepsilon)]+E[g(X)I(g(x)<\varepsilon)]\\
&\ge \varepsilon E[I(g(x)\ge\varepsilon)]\\
&\ge \varepsilon E[I(g(x)\ge\varepsilon)] \qquad \because g(X)\ge\varepsilon \text{ when } g(X)\ge\varepsilon\\
&= \varepsilon P(g(X)\ge\varepsilon)
\end{align*} E [ g ( X )] = E [ g ( X ) I ( g ( x ) ≥ ε )] + E [ g ( X ) I ( g ( x ) < ε )] ≥ εE [ I ( g ( x ) ≥ ε )] ≥ εE [ I ( g ( x ) ≥ ε )] ∵ g ( X ) ≥ ε when g ( X ) ≥ ε = εP ( g ( X ) ≥ ε )