\[\hat{\boldsymbol{\beta}}_{p\times1}\sim \mathcal{N}_p(\boldsymbol{\beta},(\mathbf{X}^T\mathbf{X})^{-1}\sigma^2)\]
\[\frac{1}{\sigma}(\mathbf{X}^T\mathbf{X})^{1/2}(\boldsymbol{\beta}-\hat{\boldsymbol{\beta}})\sim \mathcal{N}_p(\mathbf{0},\mathbf{I}) \]
\[\frac{1}{\sigma^2}(\boldsymbol{\beta}-\hat{\boldsymbol{\beta}})^T(\mathbf{X}^T\mathbf{X})(\boldsymbol{\beta}-\hat{\boldsymbol{\beta}})\sim \chi^2(p)\]
\[(n-p)\frac{s^2}{\sigma^2}\sim\chi^2(n-p)\]
Portanto:
\[\frac{1}{p s^2}(\boldsymbol{\beta}-\hat{\boldsymbol{\beta}})^T\mathbf{X}^T\mathbf{X}(\boldsymbol{\beta}-\hat{\boldsymbol{\beta}})\sim F(p,n-p)\]
Um intervalo de \(100(1-\alpha)\%\) de confiança para \(\beta_k\) é dado por:
\[\begin{eqnarray} IC(\beta_k, 1-\alpha) &=& \left[\hat{\beta}_k -t_{n-p,\alpha/2}\sqrt{\widehat{Var(\hat{\beta}_k)}};\right.\\ & &\left. \hat{\beta}_k +t_{n-p,\alpha/2}\sqrt{\widehat{Var(\hat{\beta}_k)}}\right] \end{eqnarray}\]
Estudo sobre diversidade das espécies em Galápagos.
Conjunto de dados: 30 ilhas, 7 variáveis.
Species
: the number of plant species found on the island
Endemics
: the number of endemic species
Area
: the area of the island (km\(^2\))
Elevation
: the highest elevation of the island (m)
Nearest
: the distance from the nearest island (km)
Scruz
: the distance from Santa Cruz island (km)
Adjacent
: the area of the adjacent island (square km)
library(faraway) lmod <- lm(Species ~ Area + Elevation + Nearest + Scruz + Adjacent, gala) sumary(lmod)
## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 7.068221 19.154198 0.3690 0.7153508 ## Area -0.023938 0.022422 -1.0676 0.2963180 ## Elevation 0.319465 0.053663 5.9532 3.823e-06 ## Nearest 0.009144 1.054136 0.0087 0.9931506 ## Scruz -0.240524 0.215402 -1.1166 0.2752082 ## Adjacent -0.074805 0.017700 -4.2262 0.0002971 ## ## n = 30, p = 6, Residual SE = 60.97519, R-Squared = 0.77
IC 95% para \(\beta_{Adjacent}\):
confint(lmod)[6,]
## 2.5 % 97.5 % ## -0.11133622 -0.03827344
Região de Confiança de \(100\times(1-\alpha)\%\): \[\frac{1}{p s^2}(\boldsymbol{\beta}-\hat{\boldsymbol{\beta}})^T\mathbf{X}^T\mathbf{X}(\boldsymbol{\beta}-\hat{\boldsymbol{\beta}})\leq F(p,n-p;1-\alpha)\]
RC 95% para \(\beta_{Adjacent}\) e \(\beta_{Area}\)
library(ellipse) aa <- ellipse(lmod,which=c(2,6),level=0.95) plot(aa,type="l",ylim=c(-0.13,0),col="blue") points(coef(lmod)[2], coef(lmod)[6], pch=19) abline(v=confint(lmod)[2,],lty=2) abline(h=confint(lmod)[6,],lty=2) abline(h=c(max(aa[,2]),min(aa[,2])),lty=2,col="blue") abline(v=c(max(aa[,1]),min(aa[,1])),lty=2,col="blue")
RC 95% para \(\beta_{Adjacent}\) e \(\beta_{Area}\)
Teste de hipótese linear:
\(H_0\): \(\mathbf{R}_{r\times p}\boldsymbol{\beta}_{p\times 1}=\mathbf{q}_{r\times1}\)
\(H_1\): \(\mathbf{R}_{r\times p}\boldsymbol{\beta}_{p\times 1}\neq\mathbf{q}_{r\times1}\)
Para testar, começamos pensando no vetor de discrepância com relação à \(H_0\):
\[\mathbf{R}_{r\times p}\hat{\boldsymbol{\beta}}_{p\times 1}-\mathbf{q}_{r\times1}=\mathbf{m}_{r\times 1}\]
queremos medir quão longe \(\mathbf{m}\) está de \(\mathbf{0}\).
Precisamos então conhecer a distribuição de \(\mathbf{m}\), sob \(H_0\):
\[E(\mathbf{m})=E(\mathbf{R}\hat{\boldsymbol{\beta}}-\mathbf{q}) = \mathbf{R}E(\hat{\boldsymbol{\beta}})-\mathbf{q}=\mathbf{R}\boldsymbol{\beta}-\mathbf{q}\overset{H_0}{=}\mathbf{0}\]
\[Var(\mathbf{m})=Var(\mathbf{R}\hat{\boldsymbol{\beta}}-\mathbf{q}) = \mathbf{R}Var(\hat{\boldsymbol{\beta}})\mathbf{R}^T=\sigma^2\mathbf{R}(\mathbf{X}^T\mathbf{X})^{-1}\mathbf{R}^T\]
EstatÃstica do teste:
\[\begin{eqnarray} W&=&\mathbf{m}^T[Var(\mathbf{m})]^{-1}\mathbf{m}\\ &=& (\mathbf{R}\hat{\boldsymbol{\beta}}-\mathbf{q})^T [\sigma^2\mathbf{R}(\mathbf{X}^T\mathbf{X})^{-1}\mathbf{R}^T]^{-1} (\mathbf{R}\hat{\boldsymbol{\beta}}-\mathbf{q})\\ &\overset{H_0}{\sim} & \chi^2(r) \end{eqnarray}\]
Problema: não conhecemos \(\sigma^2\). Temos que utilizar um estimador para \(\sigma^2\): \(s^2\).
Sabemos que \((n-p)\frac{s^2}{\sigma^2}\sim\chi^2(n-p)\).
A estatÃstica do teste é:
\[\begin{eqnarray} F&=&\frac{W}{r}\frac{\sigma^2}{s^2}\\ &=&\frac{(\mathbf{R}\hat{\boldsymbol{\beta}}-\mathbf{q})^T [\mathbf{R}(\mathbf{X}^T\mathbf{X})^{-1}\mathbf{R}^T]^{-1} (\mathbf{R}\hat{\boldsymbol{\beta}}-\mathbf{q})}{r\sigma^2}\frac{\sigma^2}{s^2}\\ &=&\frac{(\mathbf{R}\hat{\boldsymbol{\beta}}-\mathbf{q})^T [s^2\mathbf{R}(\mathbf{X}^T\mathbf{X})^{-1}\mathbf{R}^T]^{-1} (\mathbf{R}\hat{\boldsymbol{\beta}}-\mathbf{q})}{r}\\ &\overset{H_0}{\sim} & F_{r,n-p} \end{eqnarray}\]
\(\mathbf{R}=[0\quad 0\quad\ldots\quad 1\quad 0\quad\ldots\quad 0]\) e \(\mathbf{q}=0\).
\(\mathbf{R}=[0\quad 0\quad 1\quad \ldots\quad -1\quad 0\quad\ldots\quad 0]\) e \(\mathbf{q}=0\).
\(\mathbf{R}=[0\quad 1\quad 1\quad 1 \quad 0\quad\ldots\quad 0]\) e \(\mathbf{q}=1\).
\[\mathbf{R}=\left( \begin{array}{cccccc} 1 & 0 & 0 & 0 & \ldots & 0\\ 0 & 1 & 0 & 0 & \ldots & 0\\ 0 & 0 & 1 & 0 & \ldots & 0\\ \end{array} \right) \quad\quad\mathbf{q}=\mathbf{0} \]
\[\mathbf{R}=\left( \begin{array}{cccccc} 0 & 1 & 1 & 0 & 0 & 0\\ 0 & 0 & 0 & 1 & 0 & 1\\ 0 & 0 & 0 & 0 & 1 & 1\\ \end{array} \right) \quad\quad\mathbf{q}=\left( \begin{array}{c} 1\\ 0\\ 0 \end{array} \right) \]
library(systemfit) data( "Kmenta" ) modelo <- lm(consump ~ price + income,data=Kmenta) summary(modelo)$coef
## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 99.8954229 7.51936214 13.285093 2.090605e-10 ## price -0.3162988 0.09067741 -3.488177 2.815290e-03 ## income 0.3346356 0.04542183 7.367285 1.099860e-06
\[Y=\beta_0+\beta_1X_1+\beta_2X_2+\varepsilon\]
\(X_1\): price
\(X_2\): income
\(H_0\): \(\beta_1=0\)
R <- matrix( c(0,1,0),ncol=length(coef(modelo)),byrow=TRUE) linearHypothesis(modelo,R)
## Linear hypothesis test ## ## Hypothesis: ## price = 0 ## ## Model 1: restricted model ## Model 2: consump ~ price + income ## ## Res.Df RSS Df Sum of Sq F Pr(>F) ## 1 18 108.660 ## 2 17 63.332 1 45.328 12.167 0.002815 ** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
\(H_0\): \(\beta_1=2\)
R <- matrix( c(0,1,0),ncol=length(coef(modelo)),byrow=TRUE) q=2 linearHypothesis(modelo,R,q)
## Linear hypothesis test ## ## Hypothesis: ## price = 2 ## ## Model 1: restricted model ## Model 2: consump ~ price + income ## ## Res.Df RSS Df Sum of Sq F Pr(>F) ## 1 18 2494.21 ## 2 17 63.33 1 2430.9 652.52 5.311e-15 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
\(H_0\): \(\beta_1=2\) e \(\beta_2=1\).
R <- matrix( c(0,1,0, 0,0,1),ncol=length(coef(modelo)),byrow=TRUE) q=matrix(c(2,1),ncol=1) linearHypothesis(modelo,R,q)
## Linear hypothesis test ## ## Hypothesis: ## price = 2 ## income = 1 ## ## Model 1: restricted model ## Model 2: consump ~ price + income ## ## Res.Df RSS Df Sum of Sq F Pr(>F) ## 1 19 7146.8 ## 2 17 63.3 2 7083.4 950.7 < 2.2e-16 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Faraway - Linear Models with R: Seção 3.5.
Draper & Smith - Applied Regression Analysis: Seção 9.1.