probability-and-statistics

概率论与数理统计

高数预备知识

+ex2dx=π\int_{-\infin}^{+\infin}e^{-x^2}dx=\sqrt\pi

伽马函数

Γ(α)=0+xα1exdx=(α1)Γ(α1)\Gamma(\alpha)=\int_0^{+\infin}x^{\alpha-1}e^{-x}dx=(\alpha-1)\Gamma(\alpha-1)

Γ(12)=π\Gamma(\frac 12)=\sqrt\pi

贝塔函数(p>0,q>0)(p>0,q>0)

B(p,q)=Γ(p)Γ(q)Γ(p+q)=01xp1(1x)q1dxB(p,q)=\frac{\Gamma(p)\Gamma(q)}{\Gamma(p+q)}=\int _0^1x^{p-1}(1-x)^{q-1}dx

分布函数

0-1分布

Xb(1,p)X\sim b(1,p)

P{X=k}=pk(1p)1k,k=0,1P\{X=k\}=p^k(1-p)^{1-k},k=0,1

E(X)=pE(X)=p

D(X)=p(1p)D(X)=p(1-p)

二项分布

Xb(n,p)X\sim b(n,p)

P{X=k}=Cnkpk(1p)nk,k=0,1,...,nP\{X=k\}=C_n^kp^k(1-p)^{n-k},k=0,1,...,n

E(X)=npE(X)=np

D(X)=np(1p)D(X)=np(1-p)

泊松分布

Xπ(λ)X\sim \pi(\lambda)

P{X=k}=λkeλk!,k=0,1,2,...P\{X=k\}=\frac{\lambda^ke^{-\lambda}}{k!},k=0,1,2,...

E(X)=λE(X)=\lambda

D(X)=λD(X)=\lambda

均匀分布

XU(a,b)X\sim U(a,b)

f(x)={1ba,b<x<a0,elsef(x)=\begin{cases} \begin{aligned} \frac 1{b-a}&,b<x<a\\ 0&,else \end{aligned} \end{cases}

E(X)=a+b2E(X)=\frac{a+b}2

D(X)=(ba)212D(X)=\frac{(b-a)^2}{12}

指数分布

f(x)={1θexθ,x>00,elsef(x)=\begin{cases} \begin{aligned} \frac 1\theta e^{-\frac x\theta}&,x>0\\ 0&,else \end{aligned} \end{cases}

E(X)=θE(X)=\theta

D(X)=θ2D(X)=\theta^2

正态分布

XN(μ,σ2)X\sim N(\mu,\sigma^2)

f(x)=12πσe(xμ)22σ2f(x)=\frac 1{\sqrt{2\pi}\sigma}e^{-\frac{(x-\mu)^2}{2\sigma^2}}

E(X)=μE(X)=\mu

D(X)=σ2D(X)=\sigma^2

抽样分布

卡方分布

X1,X2,...,XnX_1,X_2,...,X_n是来自正态总体N(0,1)N(0,1)的样本,则

χ2=X12+X22+...+Xn2χ2(n)\chi^2=X_1^2+X_2^2+...+X_n^2 \sim\chi^2(n)

χ12χ2(n1)\chi_1^2\sim\chi^2(n_1)χ22χ2(n2)\chi_2^2\sim\chi^2(n_2),则

χ12+χ22χ2(n1+n2)\chi_1^2+\chi_2^2\sim\chi^2(n_1+n_2)

χ2χ2(n1)\chi^2\sim\chi^2(n_1), 则

E(χ2)=nE(\chi^2)=n

D(χ2)=2nD(\chi^2)=2n

上分位数

P{χ2>χα2(n)}=αP\{\chi^2>\chi^2_\alpha(n)\}=\alpha

t分布

XN(0,1)X\sim N(0,1), Yχ2(n)Y\sim\chi^2(n),且XXYY相互独立,则:

XYnt(n)\frac{X}{\sqrt {\frac {Y}{n}}}\sim t(n)

上分位数

P{t>tα(n)}=αP\{t>t_\alpha(n)\}=\alpha

t1α(n)=tα(n)t_{1-\alpha}(n)=-t_\alpha(n)

F分布

Uχ2(n1)U\sim \chi^2(n_1), Vχ2(n2)V\sim \chi^2(n_2),且UUVV相互独立,则:

Un1Vn2F(n1,n2)\frac{\frac {U}{n_1}}{\frac {V}{n_2}}\sim F(n_1,n_2)

1FF(n2,n1)\frac 1{F}\sim F(n_2,n_1)

上分位数

P{F>Fα(n1,n2)}=αP\{F>F_\alpha(n_1,n_2)\}=\alpha

F1α(n1,n2)=1Fα(n2,n1)F_{1-\alpha}(n_1,n_2)=\frac{1}{F_\alpha(n_2,n_1)}

正态分布的样本均值和样本方差的分布

  • X1,X2,...,XnX_1,X_2,...,X_n有相同的均值μ\mu , 方差 σ2\sigma^2,则

E(X)=μE(\overline X)=\mu

D(X)=σ2nD(\overline X)=\frac{\sigma^2}n

E(S2)=σ2E(S^2)=\sigma^2

  • X1,X2,...,XnX_1,X_2,...,X_n是来自正态总体N(μ,σ2)N(\mu,\sigma ^2)的样本,则

XN(μ,σ2n)\overline X \sim N(\mu,\frac{\sigma^2}n)

(n1)S2σ2χ2(n1)\frac{(n-1)S^2}{\sigma^2}\sim \chi^2(n-1)

X\overline XS2S^2相互独立

XμSnt(n1)\frac{\overline X-\mu}{\frac{S}{\sqrt n}}\sim t(n-1)

  • X1,X2,...,Xn1X_1,X_2,...,X_{n_1}是来自正态总体N(μ1,σ12)N(\mu_1,\sigma_1 ^2)的样本, Y1,Y2,...,Yn2Y_1,Y_2,...,Y_{n_2}是来自正态总体N(μ2,σ22)N(\mu_2,\sigma_2 ^2)的样本, 两个样本相互独立,则

两个样本的平均值:

X=1n1i=1n1Xi\overline X=\frac 1 {n_1}\sum_{i=1}^{n_1}X_i

Y=1n2i=1n2Yi\overline Y=\frac 1{n_2}\sum_{i=1}^{n_2}Y_i

两个样本的样本方差

S12=1n11i=1n1(XiX)2S_1^2=\frac 1{n_1-1}\sum_{i=1}^{n_1}(X_i-\overline X)^2

S22=1n21i=1n2(YiY)2S_2^2=\frac 1{n_2-1}\sum_{i=1}^{n_2}(Y_i-\overline Y)^2

满足以下性质

S12S22σ12σ22F(n11,n21)\frac{\frac{S_1^2}{S_2^2}}{\frac{\sigma_1^2}{\sigma_2^2}}\sim F(n_1-1,n_2-1)

σ12=σ22=σ2\sigma_1^2=\sigma_2^2=\sigma^2时,

(XY)(μ1μ2)SW1n1+1n2t(n1+n22)\frac{(\overline X-\overline Y)-(\mu_1-\mu_2)}{S_W\sqrt{\frac 1{n_1}+\frac 1{n_2}}}\sim t(n_1+n_2-2)

其中

SW2=(n11)S12+(n21)S22n1+n22S_W^2=\frac{(n_1-1)S_1^2+(n_2-1)S_2^2}{n_1+n_2-2}

正态总体均值、方差的置信区间与单侧置信限

(置信水平为1α1-\alpha

单个正态总体

估计μ\mu,已知σ2\sigma^2

根据

XμσnN(0,1)\frac{\overline X-\mu}{\sigma}\sqrt n \sim N(0,1)

置信区间

(Xσnzα2,X+σnzα2)(\overline X -\frac{\sigma}{\sqrt n}z_{\frac \alpha 2},\overline X +\frac{\sigma}{\sqrt n}z_{\frac \alpha 2})

置信上界

μ=X+σnzα\overline \mu =\overline X+\frac{\sigma}{\sqrt n}z_{\alpha}

置信下界

μ=Xσnzα\underline \mu = \overline X-\frac{\sigma}{\sqrt n}z_{\alpha}

估计μ\mu,未知σ2\sigma^2

根据

XμSnt(n1)\frac{\overline X-\mu}{S}\sqrt n \sim t(n-1)

置信区间

(XSntα2(n1),X+Sntα2(n1))(\overline X -\frac{S}{\sqrt n}t_{\frac \alpha 2}(n-1),\overline X +\frac{S}{\sqrt n}t_{\frac \alpha 2}(n-1))

置信上界

μ=X+Sntα(n1)\overline \mu =\overline X +\frac{S}{\sqrt n}t_{\alpha}(n-1)

置信下界

μ=XSntα(n1)\underline \mu =\overline X -\frac{S}{\sqrt n}t_{\alpha}(n-1)

估计σ2\sigma^2,未知μ\mu

根据

(n1)S2σ2χ2(n1)\frac {(n-1)S^2}{\sigma^2}\sim \chi^2(n-1)

置信区间

((n1)S2χα22(n1),(n1)S2χ1α22(n1))(\frac{(n-1)S^2}{\chi^2_{\frac \alpha 2}(n-1)},\frac{(n-1)S^2}{\chi^2_{1-\frac \alpha 2}(n-1)})

置信上界

σ2=(n1)S2χ1α2(n1)\overline {\sigma^2}=\frac{(n-1)S^2}{\chi^2_{1-\alpha}(n-1)}

置信下界

σ2=(n1)S2χα2(n1)\underline {\sigma^2}=\frac{(n-1)S^2}{\chi^2_{\alpha}(n-1)}

两个正态总体

估计μ1μ2\mu_1-\mu_2,已知σ12\sigma_1^2σ22\sigma_2^2

根据

(XY)(μ1μ2)σ12n1+σ22n2N(0,1)\frac{(\overline X-\overline Y)-(\mu_1-\mu_2)}{\sqrt{\frac{\sigma_1^2}{n_1}+\frac{\sigma_2^2}{n_2}}}\sim N(0,1)

置信区间

(XYzα2σ12n1+σ22n2,XY+zα2σ12n1+σ22n2)(\overline X-\overline Y-z_{\frac \alpha 2}\sqrt{\frac{\sigma_1^2}{n_1}+\frac{\sigma_2^2}{n_2}},\overline X-\overline Y+z_{\frac \alpha 2}\sqrt{\frac{\sigma_1^2}{n_1}+\frac{\sigma_2^2}{n_2}})

置信上界

μ1μ2=XY+zασ12n1+σ22n2\overline {\mu_1-\mu_2}=\overline X-\overline Y+z_{\alpha}\sqrt{\frac{\sigma_1^2}{n_1}+\frac{\sigma_2^2}{n_2}}

置信下界

μ1μ2=XYzασ12n1+σ22n2\underline {\mu_1-\mu_2}=\overline X-\overline Y-z_{\alpha}\sqrt{\frac{\sigma_1^2}{n_1}+\frac{\sigma_2^2}{n_2}}

估计μ1μ2\mu_1-\mu_2,未知σ12\sigma_1^2, σ22\sigma_2^2,但σ12=σ22=σ2\sigma_1^2=\sigma_2^2=\sigma^2

根据

(XY)(μ1μ2)SW1n1+1n2t(n1+n22)\frac{(\overline X-\overline Y)-(\mu_1-\mu_2)}{S_W\sqrt{\frac 1{n_1}+\frac 1{n_2}}}\sim t(n_1+n_2-2)

其中:

SW2=(n11)S12+(n21)S22n1+n22S_W^2=\frac{(n_1-1)S_1^2+(n_2-1)S_2^2}{n_1+n_2-2}

置信区间

(XYSW1n1+1n2tα2(n1+n22),XY+SW1n1+1n2tα2(n1+n22))(\overline X-\overline Y-S_W\sqrt{\frac{1}{n_1}+\frac{1}{n_2}}t_{\frac \alpha 2}(n_1+n_2-2),\overline X-\overline Y+S_W\sqrt{\frac{1}{n_1}+\frac{1}{n_2}}t_{\frac \alpha 2}(n_1+n_2-2))

置信上界

μ1μ2=XY+SW1n1+1n2tα(n1+n22)\overline {\mu_1-\mu_2}=\overline X-\overline Y+S_W\sqrt{\frac{1}{n_1}+\frac{1}{n_2}}t_{\alpha}(n_1+n_2-2)

置信下界

μ1μ2=XYSW1n1+1n2tα(n1+n22)\underline {\mu_1-\mu_2}=\overline X-\overline Y-S_W\sqrt{\frac{1}{n_1}+\frac{1}{n_2}}t_{\alpha}(n_1+n_2-2)

估计σ12σ22\frac{\sigma_1^2}{\sigma_2^2},未知μ1\mu_1μ2\mu_2

根据

S12S22σ12σ22F(n11,n21)\frac{\frac{S_1^2}{S_2^2}}{\frac{\sigma_1^2}{\sigma_2^2}}\sim F(n_1-1,n_2-1)

置信区间

(S12S22Fα2(n11,n21),S12S22F1α2(n11,n21))(\frac{\frac{S_1^2}{S_2^2}}{F_{\frac \alpha 2}(n_1-1,n_2-1)},\frac{\frac{S_1^2}{S_2^2}}{F_{1-\frac \alpha 2}(n_1-1,n_2-1)})

置信上界

(σ12σ22)=S12S22F1α(n11,n21)\overline{(\frac{\sigma_1^2}{\sigma_2^2})}=\frac{\frac{S_1^2}{S_2^2}}{F_{1-\alpha}(n_1-1,n_2-1)}

置信下界

(σ12σ22)=S12S22Fα(n11,n21)\underline{(\frac{\sigma_1^2}{\sigma_2^2})}=\frac{\frac{S_1^2}{S_2^2}}{F_{\alpha}(n_1-1,n_2-1)}

正态总体均值、方差的检验法

(显著性水平为α\alpha)

H0H_0为原假设, H1H_1为备择假设

单个正态总体均值的检验

总体XN(μ,σ2)X\sim N(\mu,\sigma^2)

如果方差已知

已知σ2\sigma^2, 检验μ\mu

利用检验统计量

Z=Xμ0σnN(0,1)Z=\frac{\overline X-\mu_0}{\sigma}\sqrt n \sim N(0,1)

H0:μμ0H_0: \mu\le \mu_0, H1:μ>μ0H_1: \mu> \mu_0, 拒绝域为

z=xμ0σnzαz=\frac{\overline x-\mu_0}{\sigma}\sqrt n \ge z_\alpha

H0:μμ0H_0: \mu\ge \mu_0, H1:μ<μ0H_1: \mu< \mu_0, 拒绝域为

z=xμ0σnzαz=\frac{\overline x-\mu_0}{\sigma}\sqrt n \le -z_\alpha

H0:μ=μ0H_0: \mu= \mu_0, H1:μμ0H_1: \mu\ne \mu_0, 拒绝域为

z=xμ0σnzα2|z|=|\frac{\overline x-\mu_0}{\sigma}\sqrt n| \ge z_{\frac \alpha 2}

如果方差未知

未知σ2\sigma^2, 检验μ\mu

利用检验统计量

t=Xμ0Snt(n1)t=\frac{\overline X-\mu_0}{S}\sqrt n \sim t(n-1)

H0:μμ0H_0: \mu\le \mu_0, H1:μ>μ0H_1: \mu> \mu_0, 拒绝域为

t=xμ0Sntα(n1)t=\frac{\overline x-\mu_0}{S}\sqrt n \ge t_\alpha (n-1)

H0:μμ0H_0: \mu\ge \mu_0, H1:μ<μ0H_1: \mu< \mu_0, 拒绝域为

t=xμ0Sntα(n1)t=\frac{\overline x-\mu_0}{S}\sqrt n \le - t_\alpha (n-1)

H0:μ=μ0H_0: \mu= \mu_0, H1:μμ0H_1: \mu\ne \mu_0, 拒绝域为

t=xμ0Sntα2(n1)|t|=|\frac{\overline x-\mu_0}{S}\sqrt n| \ge t_{\frac \alpha 2} (n-1)

两个正态总体均值差的检验

如果两个总体方差已知

已知σ12\sigma_1^2, σ22\sigma_2^2, 检验μ1μ2\mu_1-\mu_2

利用检验统计量

Z=XYδσ12n1+σ12n1N(0,1)Z=\frac{\overline X-\overline Y-\delta}{\sqrt{\frac{\sigma_1^2}{n_1}+\frac{\sigma_1^2}{n_1}}} \sim N(0,1)

H0:μ1μ2δH_0: \mu_1-\mu_2\le \delta, H1:μ1μ2>δH_1: \mu_1-\mu_2> \delta, 拒绝域为

z=xyδσ12n1+σ12n1zαz=\frac{\overline x-\overline y-\delta}{\sqrt{\frac{\sigma_1^2}{n_1}+\frac{\sigma_1^2}{n_1}}}\ge z_\alpha

H0:μ1μ2δH_0: \mu_1-\mu_2\ge \delta, H1:μ1μ2<δH_1: \mu_1-\mu_2< \delta, 拒绝域为

z=xyδσ12n1+σ12n1zαz=\frac{\overline x-\overline y-\delta}{\sqrt{\frac{\sigma_1^2}{n_1}+\frac{\sigma_1^2}{n_1}}}\le-z_\alpha

H0:μ1μ2=δH_0: \mu_1-\mu_2= \delta, H1:μ1μ2δH_1: \mu_1-\mu_2\ne \delta, 拒绝域为

z=xyδσ12n1+σ12n1zα2|z|=|\frac{\overline x-\overline y-\delta}{\sqrt{\frac{\sigma_1^2}{n_1}+\frac{\sigma_1^2}{n_1}}}|\ge z_{\frac \alpha 2}

如果两个总体方差未知

未知σ12\sigma_1^2, σ22\sigma_2^2,但σ12=σ22=σ2\sigma_1^2=\sigma_2^2=\sigma^2,检验μ1μ2\mu_1-\mu_2

利用检验统计量

t=XYδSW1n1+1n2t(n1+n22)t=\frac{\overline X-\overline Y -\delta}{S_W\sqrt{\frac 1{n_1}+\frac 1{n_2}}}\sim t(n_1+n_2-2)

其中

SW2=(n11)S12+(n21)S22n1+n22S_W^2=\frac{(n_1-1)S_1^2+(n_2-1)S_2^2}{n_1+n_2-2}

H0:μ1μ2δH_0: \mu_1-\mu_2\le \delta, H1:μ1μ2>δH_1: \mu_1-\mu_2> \delta, 拒绝域为

t=xyδSW1n1+1n2tα(n1+n22)t=\frac{\overline x-\overline y -\delta}{S_W\sqrt{\frac 1{n_1}+\frac 1{n_2}}}\ge t_\alpha(n_1+n_2-2)

H0:μ1μ2δH_0: \mu_1-\mu_2\ge \delta, H1:μ1μ2<δH_1: \mu_1-\mu_2< \delta, 拒绝域为

t=xyδSW1n1+1n2tα(n1+n22)t=\frac{\overline x-\overline y -\delta}{S_W\sqrt{\frac 1{n_1}+\frac 1{n_2}}}\le -t_\alpha(n_1+n_2-2)

H0:μ1μ2=δH_0: \mu_1-\mu_2= \delta, H1:μ1μ2δH_1: \mu_1-\mu_2\ne \delta, 拒绝域为

t=xyδSW1n1+1n2tα2(n1+n22)|t|=|\frac{\overline x-\overline y -\delta}{S_W\sqrt{\frac 1{n_1}+\frac 1{n_2}}}|\ge t_{\frac \alpha 2}(n_1+n_2-2)

正态总体方差的检验

单个总体的情况

检验σ2\sigma^2

利用检验统计量

χ2=(n1)S2σ02χ2(n1)\chi^2=\frac {(n-1)S^2}{\sigma_0^2}\sim \chi^2(n-1)

H0:σ2σ02H_0: \sigma^2\le \sigma_0^2, H1:σ2>σ02H_1: \sigma^2 >\sigma_0^2, 拒绝域为

χ2=(n1)S2σ02χα2(n1)\chi^2=\frac {(n-1)S^2}{\sigma_0^2} \ge \chi^2_{\alpha}(n-1)

H0:σ2σ02H_0: \sigma^2\ge \sigma_0^2, H1:σ2<σ02H_1: \sigma^2 <\sigma_0^2, 拒绝域为

χ2=(n1)S2σ02χ1α2(n1)\chi^2=\frac {(n-1)S^2}{\sigma_0^2} \le \chi^2_{1-\alpha}(n-1)

H0:σ2=σ02H_0: \sigma^2= \sigma_0^2, H1:σ2σ02H_1: \sigma^2 \ne \sigma_0^2, 拒绝域为

χ2=(n1)S2σ02χα22(n1)\chi^2=\frac {(n-1)S^2}{\sigma_0^2} \ge \chi^2_{\frac \alpha 2}(n-1)

χ2=(n1)S2σ02χ1α22(n1)\chi^2=\frac {(n-1)S^2}{\sigma_0^2} \le \chi^2_{1-\frac \alpha 2}(n-1)

两个总体的情况

检验σ12σ22\frac{\sigma_1^2}{\sigma_2^2}

利用检验统计量

F=S12S22σ12σ22F(n11,n21)F=\frac{\frac{S_1^2}{S_2^2}}{\frac{\sigma_1^2}{\sigma_2^2}}\sim F(n_1-1,n_2-1)

H0:σ12σ22H_0: \sigma_1^2\le \sigma_2^2, H1:σ12>σ22H_1: \sigma_1^2> \sigma_2^2, 拒绝域为

F=S12S22Fα(n11,n21)F=\frac{S_1^2}{S_2^2}\ge F_{\alpha}(n_1-1,n_2-1)

H0:σ12σ22H_0: \sigma_1^2\ge \sigma_2^2, H1:σ12<σ22H_1: \sigma_1^2 < \sigma_2^2, 拒绝域为

F=S12S22F1α(n11,n21)F=\frac{S_1^2}{S_2^2}\le F_{1-\alpha}(n_1-1,n_2-1)

H0:σ12=σ22H_0: \sigma_1^2 = \sigma_2^2, H1:σ12σ22H_1: \sigma_1^2 \ne \sigma_2^2, 拒绝域为

F=S12S22F1α2(n11,n21)F=\frac{S_1^2}{S_2^2}\le F_{1-\frac \alpha 2}(n_1-1,n_2-1)

F=S12S22Fα2(n11,n21)F=\frac{S_1^2}{S_2^2}\ge F_{\frac \alpha 2}(n_1-1,n_2-1)

基于成对数据的检验

Di=XiYiD_i=X_i-Y_i , i=1,2,...,ni=1,2,...,n

DiN(μD,σD2)D_i\sim N(\mu_D,\sigma^2_D)

利用检验统计量

t=D0SDnt(n1)t=\frac{\overline D-0}{S_D}\sqrt n \sim t(n-1)

H0:μD0H_0: \mu_D\le 0, H1:μD>0H_1: \mu_D>0, 拒绝域为

t=D0SDntα(n1)t=\frac{\overline D-0}{S_D}\sqrt n \ge t_\alpha(n-1)

H0:μD0H_0: \mu_D\ge 0, H1:μD<0H_1: \mu_D<0, 拒绝域为

t=D0SDntα(n1)t=\frac{\overline D-0}{S_D}\sqrt n \le -t_\alpha(n-1)

H0:μD=0H_0: \mu_D= 0, H1:μD0H_1: \mu_D\ne 0, 拒绝域为

t=D0SDntα2(n1)|t|=|\frac{\overline D-0}{S_D}\sqrt n |\ge t_{\frac \alpha 2}(n-1)


probability-and-statistics
https://blog.algorithmpark.xyz/2023/11/25/probability-and-statistics/
作者
CJL
发布于
2023年11月25日
更新于
2024年1月13日
许可协议