S2.4a Sampling and hypothesis tests_第1页
S2.4a Sampling and hypothesis tests_第2页
S2.4a Sampling and hypothesis tests_第3页
S2.4a Sampling and hypothesis tests_第4页
S2.4a Sampling and hypothesis tests_第5页
已阅读5页,还剩53页未读 继续免费阅读

下载本文档

版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领

文档简介

1、 boardworks ltd 20061 of 58 these icons indicate that teachers notes or useful web addresses are available in the notes page. this icon indicates the slide contains activities created in flash. these activities are not editable. for more detailed instructions, see the getting started presentation. b

2、oardworks ltd 2006 1 of 58 a2-level maths: statistics 2 for ocr s2.4a sampling and hypothesis tests boardworks ltd 20062 of 58 contents boardworks ltd 2006 2 of 58 introduction to sampling introduction to sampling sampling from a normal distribution calculating from samples unbiased estimates hypoth

3、esis testing on binomial data chocolate tasting practical one-sided hypothesis tests on binomial data one-sided versus two-sided tests critical regions boardworks ltd 20063 of 58 the british government carries out a census of the entire population of the united kingdom every 10 years (most recently

4、in april 2001). the first census in the united kingdom was carried out in 1086 with the construction of the doomesday book. however they have only been conducted on a regular basis since 1801. the census provides the government with a detailed picture of the population living in each part of the cou

5、ntry (town, city or countryside). the results are used to help plan public services (health, housing, transport and education) for the future. national census boardworks ltd 20064 of 58 in statistics we often want to obtain information from a group of individuals or about a group of objects. introdu

6、ction to sampling a sampling frame is a list of all members of the population. a census is an investigation in which information is obtained from every member of the population. the population is the set of all individuals or objects that we wish to study. boardworks ltd 20065 of 58 introduction to

7、sampling examples: 1. a head teacher is interested in finding out how long her sixth form students spend in part-time employment each week. the population is the set of all sixth form students in her school. a possible sampling frame would be the registers of sixth form tutor groups. 2. a newspaper

8、is interested in obtaining the views of residents living close to the site of a proposed new airport. the population might be all adults living within a 10 mile radius of the site. a possible sampling frame could be the local electoral roll. boardworks ltd 20066 of 58 examples: 3. a car company has

9、discovered a fault that affects one of their models of car. the company may wish to know how widespread the problem might be. the population would be all cars produced of this particular model. a possible sampling frame would be a list of all registered cars of this model provided by the dvla. intro

10、duction to sampling boardworks ltd 20067 of 58 carrying out a census of the entire population is usually not feasible or sensible. introduction to sampling money time resources in addition, some investigations could result in the destruction of the entire population! for example, if a light bulb man

11、ufacturer wished to investigate the lifetime of its bulbs, a census would result in the destruction of all the bulbs it produced. a census is usually costly in terms of boardworks ltd 20068 of 58 instead of surveying the whole population, information can instead be obtained from a sample. the sampli

12、ng process should be undertaken carefully to ensure that the sample is representative of the entire population. bias can occur if one section of the population is over- or under-represented. introduction to sampling question: a local council wishes to know the views of local people on public transpo

13、rt. criticize each of the following sampling regimes: 1. ask the people waiting at the town centre bus stop. 2. leave questionnaires in local libraries for people to fill in. 3. ask people at the shopping centre on a thursday morning. boardworks ltd 20069 of 58 one way to obtain a fair sample is to

14、use random sampling. this method gives every member of the population an equal chance of being chosen for the sample. a more formal definition of a random sample is as follows: there are a number of ways in which a random sample can be chosen. one commonly used technique is to use random number tabl

15、es. sampling methods a sample of size n is called a random sample if every possible selection of size n has the same probability of being chosen. boardworks ltd 200610 of 58 the table below gives a list of random digits: 793 259 976 452 401 234 393 053 225 197 549 628 444 212 885 355 169 905 834 193

16、 439 102 356 206 753 335 713 416 584 438 085 966 235 418 626 411 469 807 561 925 290 692 923 229 288 631 523 040 940 642 775 838 281 475 here is how to use random digits to obtain a sample: random number tables example: a sample of size 15 is required from a population of size 300. one possible appr

17、oach would be to obtain a sampling frame for the population and number every member from 001 to 300. you could then obtain chains of 3 random digits from tables. if the chain corresponds to a number between 001 and 300 you could select that member of the population; otherwise you could discard that

18、chain and choose another. boardworks ltd 200611 of 58 example (continued): this method is wasteful of random digits since most chains of 3 digits will be discarded. a more efficient strategy would be to assign each member of the population to several chains of random digits: random number tables pop

19、ulation memberrandom digits 1001 301 601 2002 302 602 3003 303 603 300300 600 900 this approach leads to only chains of digits between 901 and 000 being discarded. 793 259 976 452 401 234 393 053 225 197 549 628 444 212 885 355 169 905 834 193 439 102 356 206 753 335 713 416 584 438 085 966 235 418

20、626 411 469 807 561 925 290 692 923 229 288 631 523 040 940 642 775 838 281 475 boardworks ltd 200612 of 58 example (continued): suppose that we use the 2nd line of random digits in the above table, then the sample chosen would be: 834 234 193 193 439 139 102 102 356 56 206 206 753 153 335 35 713 11

21、3 416 116 584 284 438 138 085 85 966 (cannot be used) 235 235 418 118 random number tables 793 259 976 452 401 234 393 053 225 197 549 628 444 212 885 355 169 905 834 193 439 102 356 206 753 335 713 416 584 438 085 966 235 418 626 411 469 807 561 925 290 692 923 229 288 631 523 040 940 642 775 838 2

22、81 475 boardworks ltd 200613 of 58 introduction to sampling sampling from a normal distribution calculating from samples unbiased estimates hypothesis testing on binomial data chocolate tasting practical one-sided hypothesis tests on binomial data one-sided versus two-sided tests critical regions co

23、ntents 13 of 58 boardworks ltd 2006 sampling from a normal distribution boardworks ltd 200614 of 58 sampling from a normal distribution boardworks ltd 200615 of 58 suppose that a sample of size n is taken from a n, 2 distribution and that the sample mean is . if the sampling process were to be repea

24、ted again, a different sample would be extracted and a slightly different value for the sample mean would be obtained. the value of the sample mean is therefore subject to sampling variability. the sample mean therefore has a distribution, known as its sampling distribution. it is possible to show t

25、hat, when a sample of size n is drawn from a normal distribution with mean and standard deviation , the sampling distribution of the sample mean is: sampling from a normal distribution x , 2 xn n boardworks ltd 200616 of 58 example: if a sample of size 40 is taken from a n15, 24 distribution, then t

26、he sampling distribution of the sample mean is: notice that the variance of is . this shows that the sampling variability can be decreased by taking larger samples (i.e., increasing the value of n). the standard deviation of is . this is usually referred to as the standard error of the sample mean.

27、sampling from a normal distribution x 2 n n , 2 243 1515 405 x nnn n x boardworks ltd 200617 of 58 introduction to sampling sampling from a normal distribution calculating from samples unbiased estimates hypothesis testing on binomial data chocolate tasting practical one-sided hypothesis tests on bi

28、nomial data one-sided versus two-sided tests critical regions contents boardworks ltd 2006 17 of 58 calculating from samples boardworks ltd 200618 of 58 recall the formula we met earlier for finding the variance of a set of data: variance = sample standard deviation 2 2 x x n 2 2 x x n these formula

29、e are actually only normally used when we wish to calculate the variance or standard deviation using data from the entire population. the standard deviation is the square root of this, and is sometimes called the root mean squared deviation (rmsd): rmsd = the variance is sometimes called the mean sq

30、uared deviation (msd). boardworks ltd 200619 of 58 when a large population is being studied, data will only be collected for a sample. the sample data is then used make inferences about the population. sample data may be used to estimate the mean and variance of the whole population. sample standard

31、 deviation x x n but the most accurate estimate of the population variance is provided by the following formula: () 2 2 22 1 11 i xxx sx nnn this is referred to as the sample variance, with the square root being the sample standard deviation, s. it can be shown that the sample mean, , gives the most

32、 accurate estimate possible of the population mean. boardworks ltd 200620 of 58 example: a crisp manufacturer carries out regular monitoring of its packing machines by taking samples of 20 packets of crisps. the masses (x g) obtained in one such sample were as follows: find the mean and the standard

33、 deviation of the masses in this sample of crisp packets. sample standard deviation 588x 2 17,316x note: the question clearly mentions that the data is from a sample. we will therefore use the formula for the sample standard deviation. boardworks ltd 200621 of 58 the sample mean is given by: sample

34、standard deviation 2 22 1 1 x sx nn . 588 20 29 4 x x n g . 2 2 1588 so,173161 5158 20 120 s the sample standard deviation (s) is found as follows: .so,1 5115823gs boardworks ltd 200622 of 58 introduction to sampling sampling from a normal distribution calculating from samples unbiased estimates hyp

35、othesis testing on binomial data chocolate tasting practical one-sided hypothesis tests on binomial data one-sided versus two-sided tests critical regions contents 22 of 58 boardworks ltd 2006 unbiased estimates boardworks ltd 200623 of 58 a statistic is a quantity that is calculated from a sample o

36、f data. examples include: introduction to estimation i x x n () 2 2 22 1 11 i xxx sx nnn we are particularly interested in finding estimates of the population mean and standard deviation. the quartiles; the highest value. the sample mean, ; the sample variance, note that sample variance uses n 1 ins

37、tead of just n. boardworks ltd 200624 of 58 it can be shown that the sample mean provides an unbiased estimate of the population mean i.e. if the sampling process was carried out over and over again, the sample mean would on average produce the population mean. likewise the sample standard deviation

38、, s, is an unbiased estimate for the population standard deviation. unbiased estimates statisticparameter sample mean population mean sample variance s2 population variance 2 note that the formula gives a biased estimate of the population variance. 22 1 xx n estimator for estimator for boardworks lt

39、d 200625 of 58 example: an examiner takes a random sample of 12 of the students sitting a particular a-level examination. their percentage marks were: 55%, 64%, 76%, 48%, 73%, 51%, 67%, 31%, 55%, 85%, 60%, 62%. calculate unbiased estimates of the mean and the standard deviation of the marks for all

40、students sitting the exam. unbiased estimates . .)%.( 556462 sample mean 12 6060 5683to 3.s.f boardworks ltd 200626 of 58 unbiased estimates . 2222 55646246275 i i x . 2 2 2 1 sample variance 1 1727 46275 1112 202 81 x x nn so the sample standard deviation, s = = 14.2% (to 3sf) the sample standard d

41、eviation gives an unbiased estimate of the population standard deviation: .202 81 boardworks ltd 200627 of 58 introduction to sampling sampling from a normal distribution calculating from samples unbiased estimates hypothesis testing on binomial data chocolate tasting practical one-sided hypothesis

42、tests on binomial data one-sided versus two-sided tests critical regions contents boardworks ltd 2006 27 of 58 hypothesis testing on binomial data boardworks ltd 200628 of 58 consider the following simple situation. you suspect that a die is biased towards the number six. in order to test this suspi

43、cion, you could perform an experiment in which the die is thrown 20 times. if the die were fair, you would expect about 3 sixes. if you obtained a lot more than 3 sixes then you might decide that there is evidence to support your suspicions. but how do you decide on what a suspicious number of sixes

44、 is? a simple introductory example boardworks ltd 200629 of 58 consider throwing a fair dice 20 times. the probability of obtaining different numbers of sixes is shown in the graph: a simple introductory example boardworks ltd 200630 of 58 so, we noticed from the previous slide that, with 20 throws

45、of a fair die, the probability of getting 7 or more sixes is about 0.0371. this means that if a fair die were thrown 20 times over and over again, then you would obtain 7 or more sixes less than once in every 20 experiments. the figure of 1 in 20 (or 5%) is often taken as a cut-off point results wit

46、h probabilities below this level are sometimes regarded as being unlikely to have occurred by chance. however, in situations where more evidence is required, cut- off values of 1% or 0.1% are typically used. a simple introductory example boardworks ltd 200631 of 58 in hypothesis testing we are essen

47、tially presented with two rival hypotheses. examples might include: a formal introduction to hypothesis tests these rival hypotheses are referred to as the null and the alternative hypotheses. “the coin is fair” or “the coin is biased”; “the proportion of local people in favour of a by-pass is 80%”

48、or “the proportion is smaller than 80%”; “the drug has the same effectiveness as an existing treatment” or “the drug is more effective”. boardworks ltd 200632 of 58 the null hypothesis (h0) is often thought of as the cautious hypothesis it represents the usual state of affairs. the alternative hypot

49、hesis (h1) is usually the one that we suspect or hope to be true. hypothesis testing is concerned with examining the data collected in experiments, and deciding how likely the result is to have occurred if the null hypothesis is true. the significance level of the test is the chosen cut-off value be

50、tween the results that might plausibly have been obtained by chance if h0 is true, and the results that are unlikely to have occurred. a formal introduction to hypothesis tests boardworks ltd 200633 of 58 significance levels that are typically used are 10%, 5%, 1% and 0.1%. these significance levels

51、 correspond to different rigours of test the lower the significance level, the stronger the evidence the test will provide. a formal introduction to hypothesis tests note: it is important to appreciate that it is not possible to prove that a hypothesis is definitely true in statistics. hypothesis te

52、sts can only provide different degrees of evidence in support of a hypothesis. a 10% significance level can only provide weak evidence in support of a hypothesis. a 0.1% test is much more stringent and can provide very strong evidence. boardworks ltd 200634 of 58 introduction to sampling sampling fr

53、om a normal distribution calculating from samples unbiased estimates hypothesis testing on binomial data chocolate tasting practical one-sided hypothesis tests on binomial data one-sided versus two-sided tests critical regions contents boardworks ltd 2006 34 of 58 chocolate tasting practical boardwo

54、rks ltd 200635 of 58 do you think you can taste the difference between branded chocolate and supermarket own-label chocolate? you are going to perform an experiment to find out. there will be 2 pieces of chocolate to try: one will be a branded make of chocolate, the other will be a supermarkets own-

55、brand. try to identify the branded make. chocolate tasting practical boardworks ltd 200636 of 58 chocolate tasting practical boardworks ltd 200637 of 58 introduction to sampling sampling from a normal distribution calculating from samples unbiased estimates hypothesis testing on binomial data chocol

56、ate tasting practical one-sided hypothesis tests on binomial data one-sided versus two-sided tests critical regions contents boardworks ltd 2006 37 of 58 one-sided hypothesis tests on binomial data boardworks ltd 200638 of 58 example: mr jones, a candidate in a local election, claims to have the sup

57、port of 40% of the electorate. a rival candidate, miss smith, believes that mr jones is exaggerating his level of support. she asks a random sample of 12 local people and discovers that 3 of them support mr jones. carry out a test at the 5% significance level to see whether there is evidence that mr

58、 jones is exaggerating his level of support. one-sided hypothesis tests on binomial data boardworks ltd 200639 of 58 solution: we begin by writing down the 2 rival hypotheses. let p represent the proportion of the electorate who support mr jones. h0: p = 0.4 h1: p 0.7 the new treatment is no more su

59、ccessful than the existing treatment. the new treatment is better than the standard treatment.significance level = 1% let x be the number of people successfully treated by the new drug. if the null hypothesis is true, then x b(20, 0.7). the observed data is x = 19. using tables, p(x 19) = 0.0076 1%.

60、 we reject the null hypothesis at the 1% level there is quite strong evidence that the new treatment is more successful. examination-style question boardworks ltd 200644 of 58 introduction to sampling sampling from a normal distribution calculating from samples unbiased estimates hypothesis testing

温馨提示

  • 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
  • 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
  • 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
  • 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
  • 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
  • 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
  • 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

评论

0/150

提交评论