【独家】Alevel数学Statistic1(S1)考试知识点
Summary of key points in S1
Chapter 1: Binomial distribution
1. (重点***)计算二项分布的概率:
(1)公式法(**),由),(~p n B X ,则有
x n x n x
p p x X P --==)1()()( (2)查表法(***):利用书中135-139页中的)()(x X P x F ≤=,其中p
是0.05的倍数、一直到0.50,n 最小是5、最大是50。
2. (重点**)计算二项分布的期望和方差:),(~p n B X ,则有
np X E =)( )1()(p np X Var -=
3. (考点*)二项分布的条件:
● A fixed number of trials,n .
● Each trial should be success or failure.
● The trials are independent.
● The probability of success,p , at each trial is constant.
其中,n 为指数(index ),p 为参数(parameter )。
难点是要求根据题意写出二项分布的条件,如果有题意背景的,要根据题意写。
4. (考点*)如果),(~p n B X ,其中
5.0>p ,则)1,(~p n B Y -,那么5.01≤-p ;
如果p 是0.05的倍数,则可以用查表法求概率。
5. 典型例题:例7/8/9*/10/11/12/13(a)/14*
6. 复习题:Review Exercise 1: 1/4/8
7. 练习册部分题目:
12-01-2, 10-01-1, 08-01-2
Chapter 2: Representation and summary of data – location
1、Frequency tables and grouped data
cumulative frequency :to add a column to the frequency table showing the running
total of the frequencies.
A grouped frequency distribution consists of classes and their related class
frequencies.
Classes 30-31 32-33 34-35
For the class 32-33
Lower class boundary is 31.5
Upper class boundary is 33.5
Class width is 33.5-31.5=2
Class mid-point is (31.5+33.5)/2=32.5
2、The measurements of location of the centre of a set of data – mode, median and
mean
● The mode is the value that occurs most often.
● The median is the middle value or the half of the two middle values, when the
data is put in order.
● The mean is the sum of all the observations divided by the total number of the
observations. The mean of a sample of data in a frequency distribution, is x where
∑∑=f
fx x 3、Coding for large data values
Coding is normally of the form
b
a x y -= where a and
b are to be chosen.
To find the mean of the original data; find the mean of the coded data , equate this to
the coding used and solve.
Chapter 3:Representation and summary of data – measures if dispersion
1、The range of a set of data is the difference between the highest and lowest value in
the set.
The quartiles, ,1Q ,2Q ,3Q split the data into four parts. To calculate the lower
quartile , divide n by 4.
For discrete data for the lower quartile, ,1Q divide n by 4. To calculate the upper
quartile, ,3Q divide n by 4 and multiply by 3. When the result is a whole number
find the mid-point of the corresponding term and the term above. When the result is
not a whole number round the number up and pick the corresponding term.
For continuous grouped data for ,1Q divide n by 4, for 3Q divide n by 4 and
multiply by 3. Use interpolation to find the value of the corresponding term.
The inter-quartile range is .13Q Q -
2、The standard deviation and variance of discrete data variance=2
2
2)(???? ??-=-∑∑∑n x n x n x x standard deviation= Variance
If you let f stand for the frequency, then ∑=f n and
Variance=2
22)(???
? ??-=-∑∑∑∑∑∑f fx f fx f x x f 3、Adding or subtracting numbers does not change the standard deviation of the data.
Multiplying or dividing the data by a number does affect the standard deviation.
To find the standard deviation of the original data, find the standard deviation of the
coded data and either multiply this by what you divide the data by, or divide this by
what you multiplied the data by.
Chapter 4: Representation of data
1. A stem and leaf diagram is used to order and present data given to two or three
significant figures. Each number is first split into its stem and leaf .
Two set of data can be compared by using back-to-back stem and leaf diagrams.
2、An outlier is an extreme value that lies outside the overall pattern of the data,
which is
? greater that the upper quartile +1.5?inter-quartile range
or
? less that the lower quartile -1.5?inter-quartile range.
3、Box plot
Using box plots to compare two sets of data
4、Histogram
A histogram gives a good picture of how data are distributed. It enables you to see a
rough location, the general shape of the data and how spread out the data are.
A histogram is similar to a bar chart but are two major differences
● There are no gaps between the bars.
● The area of the bar is proportional to the frequency.
To calculate the height of each bar (the frequency density ) use the formula
Area of bar=?k frequency.
1=k is the easiest value to use when drawing a histogram then
Frequency density=ClassWidth
Frequency 5、The shape (skewness) of a data set
The ways of describing whether a distribution is skewed:
? You can use the quartiles.
If 2312Q Q Q Q -=- then the distribution is symmetrical .
If 2312Q Q Q Q -<- then the distribution is positively skewed .
If 2312Q Q Q Q ->- then the distribution is negatively skewed .
? You can use the measures of location
mode=median=mean describes a distribution which is symmetrical .
mode mode>median>mean describes a distribution with negative skew . 6、Comparing the distributions of data sets ● The IQR is often used together with the median when the data are skewed . ● The mean and standard deviation are generally used when the data are fairly symmetrical . Chapter 5: Probability 1、Vocabulary used in probability A sample space is the set of all possible outcomes of an experiment. The probability of an event is the chance that the event will occur as a result of an experiment. Where outcomes are equally likely the probability of an event is the number of outcomes in the event divided by the total number of possible outcomes in the sample space. 2、Venn diagrams You can use Venn diagrams to solve probability problems for two or three events. A rectangle represents the sample space and it contains closed curves that represent events. 3、Using formulae to solve problems Addition Rule )()()()(B A P B P A P B A P -+= Conditional probability The probability of B given A , written )|(A B P , is called the conditional probability of B given A and so: ) ()()|(A P A B P A B P = Multiplication Rule )()|()(A P A B P A B P ?= 4、Tree diagrams 5、Mutually exclusive and independent events When A and B are mutually exclusive, then Φ=B A , so .0)(=B A P The Addition Rule applied to mutually exclusive events: )()()(B P A P B A P += If A and B are independent, then: )()()(B P A P B A P ?= Chapter 6: Correlation 6.1 Scatter diagrams If both variables increase together they are said to be positively correlated . For a positive correlation the points on the scatter diagram increase as you go from left to right. Most points lie in the first and third quadrants. If one variable increases as the other decreases they are said to be negatively correlated . For a negative correlation the points on the scatter diagram decrease as you go from left to right. Most points lie in the second and fourth quadrants. If no straight line (linear) pattern can be seen there is said to be no correlation . For no correlation the points on the scatter diagram lie fairly evenly in all four quadrants. Examples: 1/2/3 6.2 You can calculate measures for the variability of bivariate data ()()()()∑∑∑∑∑∑∑∑∑∑-=--=-=-=-=-=n y x y y y x x S n y y y S n x x x S i i i i i xy i i yy i i xx i 22i 222i 2 x ))((y )(x )( 注:上面的公式在公式本中有。 Examples: 5* Lesson Three 6.3 Product moment correlation coefficient r yy xx xy S S S r = Examples: 6* Exercise 6B: Q4/5 6.4 Using r to determine the strength of the linear relationship between the variables The value of r varies between 1 -and 1. If 1 = r there is a perfect positive linear correlation between the two variables (all points fit a straight line with positive gradient). If 1- = r there is a perfect negative linear correlation between the two variables (all points fit a straight line with negative gradient). If r is zero (or close to zero) there is no linear correlation; this does not, however, exclude any other sort of relationship. Values of r between 1 and 0 indicate a greater or lesser degree of positive correlation. The closer to 1 the better the correlation, the closer to 0 the worse the correlation. Values of r between -1 and 0 indicate a greater or lesser degree of negative correlation. The closer to -1 the better the correlation, the closer to 0 the worse the correlation. Examples: 7 Lesson Four 6.5 The limitation of r Examples: 8/9 6.6 Using coding to simplify the calculation of r You can rewrite the variables x and y by using the coding b a x p - =and d c x p - = where c b a, ,and d are suitable numbers to be chosen. r is not affected by coding. Examples: 10* Exercise 6E: Q7/10 Exercises and homework: Review Exercise 2 Q1/4 Q5(a)(b) (Jan 2012) Lecture 5 Chapter 7: Regression Lesson One 7.1 The rule bx a y+ = bx a y+ =is the equation of a straight line. If bx a y+ =then a(sometimes called the intercept) is where the line cuts the y -axis and b is the amount by which y increases for an increase of 1 in x , (b is called the gradient of the line). Examples: 1/2 7.2 Independent and dependent variables An independent (or explanatory) variable is one that is set independently of the the other variable. It is plotted along the x -axis. An dependent (or response) variable is one whose values are determined by the values of the independent variable. It is plotted along the y -axis. Examples: 3 Lesson Two 7.3 The values of a and b for minimum sum of residuals For each point on a scatter diagram you can express y in terms of x as ,)(e bx a y ++= where e is the vertical distance from the line of best fit, is called residual . The line that minimizes the sum of the squares of the residuals is called the least squares regression line . The line is called the regression line of y on x . The equation of the regression line of y on x is: bx a y += where xx xy S S b = and x b y a -= 注:上面的公式在公式本中有。 Examples: 4* 7.4 Coding is sometimes used to simplify calculations Examples: 5* Lesson Three 7.5 Applying and interpreting the regression equation A regression line can be used to estimate the value of the dependent variable for any value of the independent variable. Interpolation is when you estimate the value of a dependent variable within the range of the data. Extrapolation is when you estimate the value outside the range of the data. Values estimated by extrapolation can be unreliable. You should not, in general, extrapolate and you must view any extrapolated values with caution. Examples: 6/7*/8* Exercises and homework: Page147 Q6 Review Exercise 2 Q5/9/12/15/16/ Q5(c)-(f)(Jan 2012) Lesson Four Chapter 8: Discrete random variables 8.1 A variable is represented by a symbol, and it can take on any of a specified set of values. When the value of a variable is the outcome of an experiment, the variable is called a random variable . Another name for the list of all possible outcomes of an experiment is the sample space . For a random variable :X ? x is a particular value of .X ? )(x X P = refers to the probability that X is equal to a particular value of x . A continuous random variable is one where the outcome can be any value on a continuous scale. A discrete random variable takes only values on a discrete scale. Examples: 1/2 8.2 ? To specify a discrete random variable completely, you need to know its set of possible values and the probability with which it takes each one. ? You can draw up a table to show the probability of each outcome of an experiment. This is called a probability distribution . ? You can also specify a discrete random variable as a function, which is known as a probability function . Examples: 3 8.3 Sum of probabilities ? For a discrete random variable the sum of all the probabilities must add up to one, that is ∑∑===.1)()(x X P x p Examples: 4 Lecture 6 Chapter 8: Discrete random variables 8.4-8.11 Lesson One 8.4 Examples: 6 8.5 Cumulative distribution function for a discrete random variable ? )()(x X P x F ≤= ? Like a probability distribution, a cumulative distribution function can be written as a table. Examples: 7/8 Lesson Two 8.6 The mean or expected value of a discrete random variable ? expected value of X ∑∑====)()()(x xp x X xP X E Examples: 9/10* 8.7 Finding an expected value for 2X ? expected value of 2X ∑===)()(22x X P x X E ? In general,∑==)()(x X P x X E n n Examples: 11* Lesson Three 8.8 Finding the variance of a random variable ? The variance of X is usually written as Var(X ) and is found by using: Var 22))(()()(X E X E X -= Examples: 12* Exercise8D: Q1/2/7 8.9* The expected value and variance of a function of X ● ),()(, )()(2X E a b aX Var b X aE b aX E =++=+ where a and b are constants. Examples: 13*/14*/15/16* Exercise 8E: Q1/2/3 Lesson Four 8.10 Examples: 17* Exercise8E: Q8 8.11 For a discrete random variable X over the values 1,2,3, ,n 2 1)(+=n X E 12 )1)(1()(-+=n n X Var Examples: 18 Exercises and homework: Page173 Q4/7/10 Review Exercise 2 Q2/6/8/13/18 Q3(Jan 2012) Lecture 7 Chapter 9: The normal distribution Lesson One 9.1 Use tables to find the probability of the standard normal distribution ? The standard normal distribution is written as )1,0(~2N Z For Z , ? )(1)(x Z P x Z P <-=> ? )()(z Z P z Z P ≤=< Examples: 1 Exercise 9A: Q1/4 9.2 Use tables to find the value of z given a probability ? The table of percentage points of the normal distribution gives the value of z for various values of )(z Z P p >= ? If )(a Z P < is greater than 0.5, then .0>a If )(a Z P < is less than 0.5, then .0 ? If )(a Z P > is less than 0.5, then .0>a