统计学要点摘要英文版-Statistic-Review

统计学要点摘要英文版-Statistic-Review
统计学要点摘要英文版-Statistic-Review

Chapter 2 Statistic Review

A. Random variables; 1. expected value:

Define : X is a discrete random variable, “ the mean (or expected value) of X ” is the

weighted average of the possible outcomes, where the probabilities of the outcome serve as the appropriate weight.

∑++===n n i i X X p X p X p X p X E ......)(2211μ p i is ith of prob., i=1,2, ……n

Interpretation : The random variable is a variable that have a probability associated with

each outcome. Outcome is not controlled. Discrete random Var. : has finite outcome, or outcome is countable infinite.

Continuous random Var.: uncountable infinite outcome, the probability of each outcome is

small because of too many numbers.

For normal random Var., probability density function is used to calculate the probability

between the are.

E( ): the expectations operator, ∑=1i p

→ N

X

obs of N X X X X X N

i i

N

∑==

+++=`

1321.)

(....… “ sample mean ”, used to estimate X μ

The X ?

is changed from sample to sample. is not a fixed on time, the outcome selected should not be the same. There is prob. associated with each X ?. X ?is also a random variable, we can calculate E(X ?).

2. variance: measure the dispersion(分散), the range of the value

∑-=-=-==2

2

2

2

)()]([})]([{)(X

i i i i X

X E X E X E X E X p X Var μσ “population variance ” constant

)(X Var X =σ …………... “population standard deviation ”

→ 1

)(?2

22--=

=∑N X X S i

X X σ

“sample variance ” used to estimate 2

X σ

2

?X X X S S ==σ

……….. “sample standard deviation ”

3. joint distribution: (linear relation of X and Y bi-variance random variance.)

)])([(),(Y X Y X E Y X Cov μμ--=

Covariance, measuring the linear relationship between X and Y.

),(Y X Cov , depends on the units of X and Y; different unit-> different

),(Y X Cov >0: the best-fitting line has a positive slope, positive relationship between X and Y . ),(Y X Cov <0: the best-fitting line has a negative slope, negative relationship between X and Y . ),(Y X Cov =0: there is no linear relationship between X and Y , but may be have nonlinear

relationship.

Y

X XY Y X σσρ)

,cov(=

“population correlation coefficient ” is scale free.

XY ρ>0, a positive correction, XY ρ<0, a negative correction XY ρ=0, no linear relationship between X and Y

11-≥≥XY ρ, XY ρ=1: regression line is a straight line with positive slope, XY ρ=-1: regression

line is a straight line with negative slope.

1

))((),(?1

---=∑=N Y Y X X

Y X v

Co N

i i i

,

Y

X XY

S S Y X ),(v ?co =γ=

2

2

1

)

()()

)((∑∑∑----=Y Y

X X Y Y X X

i

i

N

i i i

“sample correlation coefficient ”

)int (,1,1)(,1)(11

prob jo p

Y p X p i

j

N i N j ij

===∑∑∑∑==

E(X) =0.263×1+0.403×2+0.334×3=2.071 Var(X) =4.881-(2.071)2=0.591959 E(Y) =0.298×6+0.385×5+0317×4=4.981 Var(Y)= 25.425-(4.981)2=0.614639

Cov(X,Y)=)()(Y E X E Y X p i

j

j i j i -∑∑=10.001-2.071×4.981= -0.317

4. formula

E(b)=b , Var(b)=0;E(aX)=aE(X), Var(aX)=a 2 Var(X ); E(aX+b)=aE(X)+b, Var(aX+b)= a 2 Var(X )

)

()]([)]}([{])([)]()[()(2

2

2

2

2

2

2X Var a X E X E a X E X a E b X aE b ax E b aX E b aX E b aX Var =-=-=+-+=+-+=+Θ

E(X+Y)=E(X)+E(Y), Var(X+Y)=Var(X)+Var(Y)+2Cov(X, Y)\

)

,(2)()()]}()][([2)]([)]({[)]}([)]({[)]()([)]()[()(2222

2Y X Cov Y Var X Var Y E Y X E X Y E Y X E X E Y E Y X E X E Y E X E Y X E Y X E Y X E Y X Var ++=----+-=-+-=--+=+-+=+Θ If X and Y is independent (linear uncorrelated), than E(X+Y)=E(X)+E(Y) → Cov(X,Y)=0, XY ρ=0, → Var(X+Y)=Var(X)+Var(Y)

)(][)]()()()([)]()][([),(=-=-=+--=+--=--=Y X Y X Y X Y X Y X XY E X Y XY E Y E X E Y XE X XE XY E Y E Y X E X E Y X Cov μμμμμμμμμμΘ

,0),(=Y X Cov can ’t define the X and Y are independent. 2))(()(X E X X E ≠? X is not independent of itself.

B. (probability) distributions:

1. the normal distribution:

),(`~2

X X N X σμ

2. the standard normal distribution:

)1,0(~,

N Z X Z X

X

σμ-=

3. the Chi-square distribution:

∑=+++==N

i N i N

Z Z Z Z 12222122Λχ Z i : N independently normal distribution random variables with 0 mean and variance 1. → As N gets larger, the χ2 distribution because an approximation of normal distribution. the rang of χ2 : 0 → ∞ 4. the t distribution:

If (1) Z is normal distribution with mean 0 and variance 1, (2) χ2 is distribution as Chi-square with N degrees of freedom, (3) Z and χ2 are independent Then

n n

t N

Z

~2χ

→As N gets large, the t distribution will tend to approximately be normal distribution. 5. the F distribution:

if X and Y are independent and distribution as Chi-square will N1 and N2 degrees of freedom, respectively thus

21,2

1~N N F N Y

N X

assume Xi ’s are independent each other,

),

(~2N

X X

X σμ regardless of the distribution of X

E(X)=)(1

)(1][

21N i i

X X X E N

X E N N

X E +++==

∑∑Λ =

X X X X X N

N X X N μμμ=?=+++)(1

Λ N N N X N X N

X X X i i X

2

2

222)(1)var(1)1

var(

)var(σσσ

=?====∑∑

22121)]()[()var(N N i X X X E X X X E X +++-+++=∑ΛΛΘ =22211)]}([)([)]({[N N X E X X E X X E X E -++-+-Λ =∑∑--+-)]()][([2)]([2j j i i i i X E X X E X E X E X =)var()var()var()var(21N i X X X X +++=∑Λ

=2222X X X X N σσσσ=+++Λ If ),(~2

X

X N X σμ, then i.i.d. (independent with other variables, identically distributed over time) If ),(~2X

X N X σμ, then ),(~2N

N X X

X σμ

* Central limit theorem:

If ),(~2X

X X σμ (at any distribution), then ),

(~2N

N X X

X σμ as N(sample size) increase.

1

)(?2

--==∑N X X

S i

X σ

, what is the distribution of N

S X 0μ- ?

C. hypothesis testing:

ex. H 0: μ=100… null hypothesis, H 1: μ≠100 …alternative hypothesis α=0.05 … level of significance

to get the value of X , the test statistic, N

X Z 2

100

σ-=

if σ2 is know.

To check the critical value, Zc ( whether or not to accept H 0)

If accept H 0:“we can not reject H 0 at 95% confidence level based on the data we have ”

If reject H 0:“we can reject H 0 at 5% level of confidence.”

Type I error: reject H 0 when H 0 is true, the probability of making type I error is α.

Type II error: accept H 0 when H 0 is false, the probability of making type II error is difficult to determine.

When the confidence interval increase, then the type I error will reduce and type II error will increase.

※ 2

2

000)1()1()()(σσ

μμμ---=-=-N S N N

X S N

X N

S X Θ

(1) if ),(~2X

X N X σμ, then ),

(~2N

N X X

X σμ, the numerator(分子) (N

S X 0μ-) is the standard

normal variable Z, and

2

2

)1(σS N - is 2

1-N χ

(2

12

2

22

2

2

)()1()(11-=-=-?--=∑∑N i i X X S N X X N S χσσ

σσ)

121

0~)

1(~

----∴

N N t N Z

N

S X χ

μ, if numerator and denominator(分母) are independent.

(2) if ),(~2

X

X N X σμ, the we will appeal to the central limit theorem.

D. point and interval estimate:

1. point estimate: ex X =40 , but we don ’t know whether the true value approximates it or

not.

2. interval estimate:

N

Z X X

σα

2

±, α=0.05 , if X σ is know.

N

S t X X

t 1

,2-±α

,α=0.05 , if X σ is unknown and N is small (<31).

95% interval will include the true value.

E. properties of estimator: unbiasedness, efficiency, consistency

1. unbiasedness:

β

? is an unbiased estimator of β, if E(β?) =β(true value) → ex. X is an unbiased estimator of X μ;ex. 2X S is an unbiased estimator of 2X σ

]})()([1

1

{

]1

)

([

)(222

2

∑∑----=--=X X X N X N E N X X E S

E μμ =2

22)(11X X X N

N N N σσσ=-- ∑∑∑----+-=---=-)])((2)()[()]()[()(2222X X X X X X X X X X X X X X μμμμμμΘ =∑∑----+-)()(2)()(22X X X X X X X N X μμμμ =222)(2)()(X X X X N X N X μμμ---+-∑ =22)()(X X X N X μμ---∑

→ bias= E(β

?) -β

2. efficiency:

define: (a) minimum mean square error,

MSE(β

?)=E(β?-β)2 =[bias(β?)]2+Var(β?).

Pove : 22]})([)](?{[)?(ββββββ

-+-=-E E E E =]})([)](?{[2])([)](?[22ββββββββ

--+-+-E E E E E E =)}?()]?([?)?(?{2)]?([)?var(22βββββββββ

E E E E bias +--++ =0)]?([)?var(2++ββ

bias → we check X , it is mostly around μ.

→ when bias(β

?)=0, then MSE(β?)=Var(β?) → If we have an unbiased estimator with large dispersion of true value (ie. High var.) and a

biased estimator with low var. we might prefer biased estimator than unbiased estimator to maximize the precision of prediction.

Def. (b) If for a given sample size, (1) β

?is unbiased, (2) Var(β?))~var(β≤, when β~ is any other unbiased estimator of β. Then β

? is an efficient estimator of β. Cramer-Rao lower bounds: gives a lower limit for the variance of any unbiased estimator.

EX. The lower var(X )=N 2σ ; the lower variance for N

42

2σσ=

( between the all unbiased estimators, the min. is k

N -=4

2

2)?var(σσ)

3. consistency:

The probability limit of β

?, plim (β?)=β as N →∞ ; ,1)?(lim =<-∞

→δβ

βprob N for any δ>0. 0)?(,1)?(:

=>-==∞

→∞

→δββ

ββ

N N prob prob ie

2. criteria to consistency:

(a) sufficient condition for consistency: not necessary for consistency

β

? is a consistent estimator of β if plim (β?)=β as N →∞ → consistency ?? plim (β

?)=β (b) mean square error consistency:

If β

? is an estimator of β and if ,0)?(lim =∞

→βMSE N then β? is a consistent estimator of β

→ ie X is a consistent estimator of X μ

0lim ,

0)var()]([)(22

2

==

+

=+=∞

→MSE N

N

X X bias X MSE N X

X

σσΘ

Slutsky ’s theorem: If plim (β

?)=β and g(β?) is a continuous function of β?, then plim g(β

?)=g[ plim (β?)] ( This is “ consistency carries over ” However, consistency is not always carries over. ) Biasedness doesn ’t carry over.

4. asymptotic unbiasedness: as N becomes large and large.

“β

?” is an asymptotically unbiased estimator of β if ,)?(lim ββ=∞

→E N → if estimator is unbiased ,?

?? it will be asym. unbiased.

→ Ex. If ∑-=22)(1~X X

N

i

σ

, it is asym. unbias

221)~(σσ

N

N E -=Θ, 22)~(lim σσ=∞→E N

5. asymptotic efficiency: N →∞

“β

?” is an asymptotically efficient estimator of β if all of the following conditions are satisfied:

(a) β

? is an asymptotic distribution with finite mean and finite variance. (b) β

? is consistent ; (c) no other consistent estimator of β has small asymptotic variance than β?. → if estimator is efficient, ?

??it will be asym. efficient.

unbiaedness efficiency consistency asym. unbiasedness asym. efficiency

biaedness inefficiency consistency asym. unbiasedness asym. efficiency

biaedness inefficiency inconsistency asym. unbiasedness asym. inefficiency

Review of linear algebra

1. a mrtrix A is idempotent iff AA=A.

2. If the inner product of the 2 vectors vanishes (ie., the scalar is 0), then the vector are orthogonal. [An inner product (or scalar, or dot product) of 2 vectors is a row vector times a column vector, yielding a scalar. An outer product of 2 vectors is a column vector times a new vector, yielding a matrix ]

3. The rank of any matrix A, ρ(A), is the size of the largest non -vanishing determinant contained in A;

Or, the rank is the maximum number of linearly independent rows or columns of A, where a 1, a 2, … a N is a set of linearly independent vectors, iff

∑==N

j j j

a k

1

0 ,necessarily implies

021====N k k k Λ

→Ex. A A ,6832???

???==12-24=-12≠ 0, =∴)(A ρ 2

B B ,4263?

?????==12-12=0 , =∴)(B ρ 1

→ properties:

a. )(0A ρ≤=intege r ≦min(M,N) where A is an M ×N matrix.

b. )(A ρ=N, 0)0(=ρ

c. )(A ρ=)'()'()'(AA A A A ρρρ==

d. if A and B are of the same order, )()()(B A B A ρρρ+≤+.

e. if AB is defined, )](),([min )(B A AB ρρρ≤

f. If A is diagonal, )(A ρ= number of nonzero elements.

g. If A is idempotent, )(A ρ=trace(A.)

h. The ranks of a matrix is not changed if one row ( or column) is multiplied by a nonzero constant, or if such a multiple of one row (column) is added to another row (column).

4. A aquare matrix of order N is nonsingular iff it is of full rank, )(A ρ=N (or ≠A 0); otherwise, it is aingular. The rank of matrix is unchanged by premultiplying or postmultiplying by a nonsingular matrix.

5. differentiation in matrix notation (rules):

a. If ????

?

?????=??????????=??M M M M M a a a X X X M M 111

, where a i =(i=1,2,…M) are constant, then

=??X

X a )

'( a. b. IF A is a symmetric matrix of order M×M where the typical element is a constant a ij , then

=??X AX X )'(2AX, if not a symmetric matrix =??X

AX X )

'((A ’+A)X. c. IF A is a symmetric matrix of order M×M where the typical element is a constant a ij , then

=???')'(X X AX X 2A, =??A

AX X )

'(XX ’

If Y and Z are vectors, B is a matrix, then

=??Y BZ Y )'(BZ , =??Z BZ Y )'(B ’Y , =??B

BZ Y )

'(YZ ’

Formula for Matrix

(A ’)’=N

M A ? =??)'(K

N N M B A B ’A ’

A B ≠BA in general

'')'(C A C A N

M ±=±?

111)(---??=D E E D M

M M M , iff det D≠0 and E≠0 ( ie. Iff D and E nonsingular matrices)

(D -1)’= (D’)-1 Det D=det D ’

Trace (D ± E )= trace (D) ±trace (E) Trace )()(N

N M

N N M FA trace F A ???=

Trace (scalar)=saclar E[trace (D)]=trace (E(D))

相关主题
相关文档
最新文档