外文翻译SAS统计分析软件和Logistic回归_第1页
外文翻译SAS统计分析软件和Logistic回归_第2页
外文翻译SAS统计分析软件和Logistic回归_第3页
外文翻译SAS统计分析软件和Logistic回归_第4页
外文翻译SAS统计分析软件和Logistic回归_第5页
已阅读5页,还剩4页未读 继续免费阅读

下载本文档

版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领

文档简介

1、sas统计分析软件和logistic回归1概况: sas系统全称为statistics analysis system,最早由北卡罗来纳大学的两位生物统计学研究生编制,并于1976年成立了sas软件研究所,正式推出了sas软件。sas是用于决策支持的大型集成信息系统,但该软件系统最早的功能限于统计分析,至今,统计分析功能也仍是它的重要组成部分和核心功能。sas现在的版本为9.0版,大小约为1g。经过多年的发展,sas已被全世界120多个国家和地区的近三万家机构所采用,直接用户则超过三百万人,遍及金融、医药卫生、生产、运输、通讯、政府和教育科研等领域。在英美等国,能熟练使用sas进行统计分析是许

2、多公司和科研机构选材的条件之一。在数据处理和统计分析领域,sas系统被誉为国际上的标准软件系统,并在9697年度被评选为建立数据库的首选产品。堪称统计软件界的巨无霸。在此仅举一例如下:在以苛刻严格著称于世的美国fda新药审批程序中,新药试验结果的统计分析规定只能用sas进行,其他软件的计算结果一律无效!哪怕只是简单的均数和标准差也不行!由此可见sas的权威地位。sas系统是一个组合软件系统,它由多个功能模块组合而成,其基本部分是base sas模块。base sas模块是sas系统的核心,承担着主要的数据管理任务,并管理用户使用环境,进行用户语言的处理,调用其他sas模块和产品。也就是说,sa

3、s系统的运行,首先必须启动base sas模块,它除了本身所具有数据管理、程序设计及描述统计计算功能以外,还是sas系统的中央调度室。它除可单独存在外,也可与其他产品或模块共同构成一个完整的系统。各模块的安装及更新都可通过其安装程序非常方便地进行。sas系统具有灵活的功能扩展接口和强大的功能模块,在base sas的基础上,还可以增加如下不同的模块而增加不同的功能:sas/stat(统计分析模块)、sas/graph(绘图模块)、sas/qc(质量控制模块)、sas/ets(经济计量学和时间序列分析模块)、sas/or(运筹学模块)、sas/iml(交互式矩阵程序设计语言模块)、sas/fsp

4、(快速数据处理的交互式菜单系统模块)、sas/af(交互式全屏幕软件应用系统模块)等等。sas有一个智能型绘图系统,不仅能绘各种统计图,还能绘出地图。sas提供多个统计过程,每个过程均含有极丰富的任选项。用户还可以通过对数据集的一连串加工,实现更为复杂的统计分析。此外,sas还提供了各类概率分析函数、分位数函数、样本统计函数和随机数生成函数,使用户能方便地实现特殊统计要求。2操作方式:sas是由大型机系统发展而来,其核心操作方式就是程序驱动,经过多年的发展,现在已成为一套完整的计算机语言,其用户界面也充分体现了这一特点:它采用mdi(多文档界面),用户在pgm视窗中输入程序,分析结果以文本的形

5、式在output视窗中输出。使用程序方式,用户可以完成所有需要做的工作,包括统计分析、预测、建模和模拟抽样等。但是,这使得初学者在使用sas时必须要学习sas语言,入门比较困难。 sas的windows版本根据不同的用户群开发了几种图形操作界面,这些图形操作界面各有特点,使用时非常方便。但是由于国内介绍他们的文献不多,并且也不是sas推广的重点,因此还不为绝大多数人所了解。3sas系统基本操作及基本概念:3.1数据集(dataset)和库:统计学的操作都是针对数据的,sas中容纳数据的文件称为数据集,数据集又包含在不同的库(暂且理解为数据库吧)中。sas中的库分为永久性和临时性两种。顾名思义,

6、存在于永久库中的数据集是永久存在的(只要你不去删除它),临时库中的数据集则在你退出sas后自动被删除。至于sas中库的概念,最简单的理解就是一个目录,一个存放数据集的目录。数据集的结构完全等同于我们一般所理解的数据表,由字段和记录所构成,在统计学中我们习惯将字段称为变量,在后面的内容中字段和变量我们就理解为同一种东西吧!建立数据集的方法很多,编程操作中有专门的数据读入方法来建立数据集,但需要将数据现场录入,费时费力。如果数据量大,我劝各位还是先以其它方法将数据集建好,否则程序语句的绝大部分会浪费在数据的输入上。3.2 sas程序概述:和其它计算机语言一样,sas语言(称为scl语言,sasco

7、mponentlanguage)也有其专有的词汇(即关键字)和语法。关键字、名字、特殊字符和运算符等按照语法规则排列组成sas语句,而执行完整功能的若干个sas语句就构成了sas程序。sas程序包括多个步骤和一些控制语句,一般情况下均包括数据步和过程步,一个或多个、数据步或过程步,它们之间任何形式的组合均可成为一段sas程序,只要能完成一个完整的功能。通常情况下sas程序还包括一些全程语句,用以控制贯穿整个sas程序的某些选项、变量或程序运行的环境。 sas程序的语句一般以关键字开始,以一个分号结束,一条语句可占多行(sas每看到一个分号,就将其以前、上一个分号以后的所有东东当作一条语句来处理

8、,而不管他们处在多少个不同的行中)。sas语句对字母的大小写不敏感,你可以根据个人习惯决定字母的大写或小写。4. logistic回归:logistic回归是一类统计模型称为广义线性模型。这一模型包括单一回归,包括普通的回归和方差分析,以及多元统计等变数和对数线性回归。一个很好使用线性模型的例子为莱斯蒂。logistic回归允许一个预测离散成果,如组成员,来自于一组变量,可能是连续的,离散的,二分,或混合任何这些。一般情况下,因变量是二分变量,如在场/缺席或成功/失败。判别分析是用来预测组成员只有两个群体。然而,判别分析只能用连续独立变量。因此,在独立的变量是一个绝对的,或混合的连续和明确情况

9、,logistic回归是首选。4.1 模型:因变量的logistic回归通常是二分变量,就是因变量值为1是事件发生,值为0是事件不发生。这种类型的变量被称为伯努利(或二元)变量。虽然不是常见的,也不是在事件中讨论,应用logistic回归也已扩大到情况下,因变量是两个以上的情况下,这种情况被称为多项式或多级 tabachnick和费德尔( 1996年)使用的术语polychotomous 。 如前所述,独立的或预测变量logistic回归可以采取任何形式。也就是说, logistic回归是不作任何假设的分布的独立变量。他们不必正态分布,线性关系或平等的差额在每个组之间的关系,预测和因变量不是一

10、个线性函数的logistic回归,代替他的是,logistic回归函数的使用是对数函数的变换:这里=截距项,=自变量的预测系数。 另一种形式的logistic回归方程为:logistic回归的目的是正确预测出一个模型,这个模型适用与大哥事件发生概率的预测。为了实现这一目标,建立一个模型,这个模型包括一个因变量和多个自变量,多个自变量被用于预测因变量的结果。在模型建立过程中几个不同的选择被利用。变量在指定的顺序可进入模型由研究员或logistic回归可以测试适合的模式后,每一个系数为增加或删除,呼吁逐步回归。逐步回归被使用在研究探索阶段,但我们不建议用于理论测试(梅纳尔1995年) 。理论测试是

11、测试各个变量之间关系的变数。探索性测试是测试给定观测值各个变量之间的关系,因此,逐步回归的目标是发现因变量与各个自变量之间的关系。 向后逐步回归似乎是首选方法探索分析,在分析,首先是全部或饱和模型和变量排除在模型中的一个反复的过程。合适的模型进行测试后,消除每个变量,以确保该模型仍能充分符合数据.当没有变量可以从模型中删除时,整个统计分析工作就完成了。这里是logistic回归的两种主要用途。首先是预测组成员。由于logistic回归计算概率或失败之上的概率,分析结果是以优势率形式进行的。例如, logistic回归经常被用于流行病学研究,分析结果是在控制其他的风险因素前提下啦预测癌症的发病率

12、。 logistic回归还提供了变量之间关系的只是(例如,吸10包烟癌症的发病率将高于你在棉矿中工作的癌症发病率)。这个过程,系数测试几个不同的技术,所有这些将在下文讨论。4.2 wald检验: wald检验是用来测试的统计意义的每一个自变量的系数( b)在该模型中是否是为0。wald检验计算的z是通过以下的公式得出的:z值再平方,产生了瓦尔德统计与卡方分布。然而,一些作者已查明了使用wald检验的缺陷。梅纳( 1995 )警告说,系数不变,标准误差增大,降低了wald统计值。莱斯蒂指出,最大似然度对于大规模样本要比使用wald测试更有效。 4.3 最大似然度检验: 最大似然使用的比例,以最大

13、化的价值,似然函数为充分模型(l1)的最大化价值的似然函数的简单的模型( l0 ) 。的似然比检验统计量等于:这个记录的可能性转变职能产生的卡方统计。这是推荐的检验统计时使用的模式,通过建设落后的逐步消除。 4.4 霍斯默- lemshow拟合优度检验: 该霍斯默- lemshow统计评估拟合优度,创造10命令群体的主题,然后比较实际的人数在各组(观察)的数量预测的logistic回归模型(预测) 。因此,检验统计量是卡方统计与理想的结果非意义,这表明该模型预测并没有显着不同的观察。 排列的10个团体的基础上创建自己的估计概率;那些估计概率低于0.1形成一组,依此类推,直至与概率0.9至1.0

14、 。每一类又分为两组,根据实际观察到的结果变量(成功,失败) 。预期的频率为每一个细胞都得到model.if模式是好的,那么大多数的主题成功属于较高风险和那些失败的风险较低。科技外文文献sas statistical analysis software and logistic regressioni. overview: sas is called the statistics analysis system, the first from the university of north carolinas two post-graduate preparation of biostatis

15、tics, and in 1976 the institute of sas software is established e, the formal sas software launched. sas is a large-scale decision support for integrated information systems, but the software system functions limited to the first statistical analysis, since the statistical analysis is still an import

16、ant part of its core functionality. the current sas version is 9.0 version, the size is about 1g. after years of development, sas has been around more than 120 countries and regions, nearly 30,000 institutions that have a direct users over three million people, across the financial, medical and heal

17、th, production, transport, communications, government and education and scientific research. in britain and the united states and other countries, skilled using sas for statistical analysis is the conditions for many companies and research institutions selection. in data processing and statistical a

18、nalysis, sas system known as the international standard software systems, and in 96 97 years has been selected as the first choice for the establishment of a database product. sas is called the big mac statistical software sector. the other example of this is as follows: in a harsh strict world-famo

19、us u.s. fda drug approval process, the statistical analysis of the drug test results is carried out sas and other software will be voided! even a simple and standard deviation are void! this shows the authority of the sas.sas is a combination of sas software system, which is a combination of multipl

20、e functional modules, the basic part of base sas module. base sas module is the core of the sas system,which assume the main task of data management and user management environment for the conduct of the user of language processing, call the other sas modules and products. in other words, sas system

21、s, we start the base sas module, which in addition has its own data management, programming and computing descriptive statistics, the sas system or the central dispatching room. it can stand alone, but also with other products or modules together form a complete system. each module can be installed

22、and updated through the installation process very easy. sas system has a flexible interface and powerful extension of the functional modules in the basis of base sas, you can add the following different modules and a variety of new features: sas / stat (statistical analysis module), sas / graph (gra

23、phics module) , sas / qc (quality control module), sas / ets (econometric and time series analysis module), sas / or (operations research module), sas / iml (interactive matrix programming language module), sas / fsp ( fast data-processing module of the interactive menu system), sas / af (interactiv

24、e full-screen application system software modules) and so on. sas has a intelligent drawing system, it not only painted a variety of charts, but also draw the map. sas provides a wide range of statistical process, each process contains a great deal of any option. users can set a series of data proce

25、ssing to realize more complex statistical analysis. in addition, sas also offers a variety of probability analysis function, quantile function, the sample statistics functions and random number generator function, so that users can request easily special statistics.2. operation sas was developed fro

26、m the mainframe system, the core operation is the process-driven, after many years of development, sas has now become a complete set of computer language, and its user interface is also fully embodied the characteristics: it uses mdi (multiple document interface), the user input program in the pgm w

27、indow, the results of the analysis in the form of text output in the output window. using the program, users can complete all the work, including statistical analysis, forecasting, modeling and simulation, sampling and so on. however, this makes the beginners to learn sas language, entry is more dif

28、ficult. the windows sas version accord to different user groups to develop a number of graphical user interface, graphical user interface of these different characteristics, use very convenient. however, due to limit, and not to promote the focus of sas, so the vast majority of people do not underst

29、and.3.the basic operation and basic concepts of sas 3.1 dataset (dataset) and the database statistics are for the operation of the data, files which is filled with sas data is named dataset. in the capacity as the data sets, data sets also included in different library (for the time being it underst

30、ood as a database). sas in the library is divided into two types of permanent and temporary. as the name suggests, the existence of a permanent library in the data set is permanent (as long as you do not delete it), temporary library in the data sets from the sas you automatically be deleted. as for

31、 the concept of sas in the database, the simplest to understand is a directory, a directory of stored data sets. the structure of a data set exactly the same as our normal understanding of data tables, fields and records by the composition, in the statistical field, we used to be known as the variab

32、le content in the back of the field and we understand the variables for the same kinds of things now! the establishment of a data set of the many ways in the programming operation of the data read into the specialized approach to the establishment of a data set, but the scene needs to be data entry,

33、 time-consuming and laborious. if the amount of data, and i advise you or to other methods to data sets will be completed, otherwise the process will be a waste of the vast majority of statements in the input data3.2 sas language and other computer languages, sas language (known as the scl language,

34、 sas component language) also has its proprietary terms (ie keywords) and grammar. keywords, names, special characters and operators, such as the composition in accordance with the grammar rules with sas statements, and the implementation of the full functionality of a number of sas statements const

35、itute the sas procedure. sas procedures, including a number of steps and a number of control statements, the general case, including data and process step-by-step step-by-step, one or more, the data step-by-step or step-by-step process, in any form between them may become a section of a combination

36、of sas procedures, as long as they can be completed a complete function. sas procedures usually include a number of the whole statement, to control procedures throughout the sas some options, variable or program environment. sas procedures begins keyword and ends semicolon, a statement can be accoun

37、ted multi-line (sas see a semicolon, it will be the past, after a semicolon sas will take a statement to process, regardless of their number in different lines). sas statements on the case insensitive letters, you may decide according to personal habits of the upper or lowercase letters.4.logistic r

38、egressionlogistic regression is part of a category of statistical models called generalized linear models. this broad class of models includes ordinary regression and anova, as well as multivariate statistics such as ancova and loglinear regression. an excellent treatment of generalized linear model

39、s is presented in agresti (1996). logistic regression allows one to predict a discrete outcome, such as group membership, from a set of variables that may be continuous, discrete, dichotomous, or a mix of any of these. generally, the dependent or response variable is dichotomous, such as presence/ab

40、sence or success/failure. discriminant analysis is also used to predict group membership with only two groups. however, discriminant analysis can only be used with continuous independent variables. thus, in instances where the independent variables are a categorical, or a mix of continuous and categ

41、orical, logistic regression is preferred. 4.1 the model: the dependent variable in logistic regression is usually dichotomous, that is, the dependent variable can take the value 1 with a probability of successq, or the value 0 with probability of failure 1-q. this type of variable is called a bernou

42、lli (or binary) variable. although not as common and not discussed in this treatment, applications of logistic regression have also been extended to cases where the dependent variable is of more than two cases, known as multinomial or polytomous tabachnick and fidell (1996) use the term polychotomou

43、s. as mentioned previously, the independent or predictor variables in logistic regression can take any form. that is, logistic regression makes no assumption about the distribution of the independent variables. they do not have to be normally distributed, linearly related or of equal variance within

44、 each group.the relationship between the predictor and response variables is not a linear function in logistic regression, instead, the logistic regression function is used, which is the logit transformation ofq: wherea = the constant of the equation and,b = the coefficient of the predictor variable

45、s. an alternative form of the logistic regression equation is:the goal of logistic regression is to correctly predict the category of outcome for individual cases using the most parsimonious model. to accomplish this goal, a model is created that includes all predictor variables that are useful in p

46、redicting the response variable. several different options are available during model creation. variables can be entered into the model in the order specified by the researcher or logistic regression can test the fit of the model after each coefficient is added or deleted, called stepwise regression

47、. stepwise regression is used in the exploratory phase of research but it is not recommended for theory testing (menard 1995). theory testing is the testing of a-priori theories or hypotheses of the relationships between variables. exploratory testing makes no a-priori assumptions regarding the rela

48、tionships between the variables, thus the goal is to discover relationships. backward stepwise regression appears to be the preferred method of exploratory analyses, where the analysis begins with a full or saturated model and variables are eliminated from the model in an iterative process. the fit

49、of the model is tested after the elimination of each variable to ensure that the model still adequately fits the data.when no more variables can be eliminated from the model, the analysis has been completed. there are two main uses of logistic regression. the first is the prediction of group members

50、hip. since logistic regression calculates the probability or success over the probability of failure, the results of the analysis are in the form of an odds ratio. for example, logistic regression is often used in epidemiological studies where the result of the analysis is the probability of develop

51、ing cancer after controlling for other associated risks. logistic regression also provides knowledge of the relationships and strengths among the variables (e.g., smoking 10 packs a day puts you at a higher risk for developing cancer than working in an asbestos mine). the process by which coefficien

52、ts are tested for significance for inclusion or elimination from the model involves several different techniques. each of these will be discussed below. 4.2 wald test: a wald test is used to test the statistical significance of each coefficient (b) in the model. a wald test calculates a z statistic, which is: this z value is then squared, yielding a wald statistic with a chi-square distribution. however, several authors have identi

温馨提示

  • 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
  • 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
  • 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
  • 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
  • 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
  • 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
  • 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

评论

0/150

提交评论