008年四川建材行业跟踪分析报告(4季度_第1页
008年四川建材行业跟踪分析报告(4季度_第2页
008年四川建材行业跟踪分析报告(4季度_第3页
008年四川建材行业跟踪分析报告(4季度_第4页
008年四川建材行业跟踪分析报告(4季度_第5页
已阅读5页,还剩47页未读 继续免费阅读

下载本文档

版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领

文档简介

1、BMTRY 701Biostatistical Methods IIElizabeth Garrett-Mayer, PhDAssociate ProfessorDirector of Biostatistics, Hollings Cancer CenterBiostatistical Methods IIDescription: This is a one-semester course intended for graduate students pursuing degrees in biostatistics and related fields such as epidemiolo

2、gy and bioinformatics. Topics covered will include linear, logistic, poisson, and Cox regression.Advanced topics will be included, such as ridge regression or hierarchical linear regression if time permits. Estimation, interpretation, and diagnostic approaches will be discussed. Software instruction

3、 will be provided in class in R. Students will be evaluated via homeworks (55%), two exams (35%) and class participation (10%). This is a four credit course.Biostatistical Methods IITextbooks: (1) Introduction to Linear Regression Analysis (4th Edition). Montgomery, Peck and Vining. Wiley; New York,

4、 2006.(2) Regression with Modeling Strategies: With Applications to Linear Models, Logistic Regression, and Survival Analysis. Frank E. Harrell, Jr. Springer; New York, 2001.Prerequisites: Biometry 700Course Objectives: Upon successful completion of the course, the student will be able toApply, inte

5、rpret and diagnose linear regression modelsApply, interpret and diagnose logistic, poisson and Cox regresssion modelsBiostatistical Methods IIInstructor:Elizabeth Garrett-MayerWebsite:/elg26/teaching/methods2.2010/methods2.2010.htmContact Info:Hollings Cancer Center, Rm 118Ggarrettm (preferred mode

6、of contact is email)792-7764Time:Mondays and Wednesdays, 1:30-3:30Location:Cannon 301, Room 305VOffice Hours:Tuesdays 2 :00 3:30pmBiostatistical Methods IILecture schedule is on the websiteSecond time teaching this classNew textbooks this yearsyllabus is a work in progresstiming of topics subject to

7、 changelectures may appear on website last-minuteComputingRintegrated into lecture timeHomeworks, articles, datasets will also be posted to websitesome/most problems will be from textbooksome datasets will be from R library If you want printed versions of lectures:download and print prior to lecture

8、; ORwork interactively on your laptop during classWe will take a break about halfway through each lecture ExpectationsAcademicParticipate in class discussionsInvest resources in YOUR educationComplete homework assignments on timeThe results of the homework should be communicated so that a person kno

9、wledgeable in the methodology could reproduce your results.Create your own study groupsChallenge one anothereveryone needs to contributeyou may do homeworks together, but everyone must turn in his/her own homework. written sections of homework should be independently developedGeneralBe on time to cl

10、assBe discrete with interruptions (pages, phones, etc.)Do NOT turn in raw computer outputOther ExpectationsKnowledge of Methods I!You should be very familiar withconfidence intervalshypothesis testingt-testsZ-testsgraphical displays of dataexploratory data analysisestimating means, medians, quantile

11、s of dataestimating variances, standard deviationsAbout the instructorB.A. from Bowdoin College, 1994Double Major in Mathematics and Economics Minor in ClassicsPh.D. in Biostatistics from Johns Hopkins, 2000Dissertation research in latent class models, Adviser Scott ZegerAssistant Professor in Oncol

12、ogy and Biostatistics at JHU, 2000-2007Taught course in Statistics for Psychosocial Research for 8 yearsApplied Research Areas:oncologyBiostats Research Areas: latent variable modelingclass discovery in microarray datamethodology for early phase oncology clinical trialsCame to MUSC in Feb 2007Comput

13、ingWho knows what?Who WANTS to know what?Who will bring a laptop to class?What software do you have and/or prefer?RegressionPurposes of RegresssionDescribe association between Y and XsMake predictions:Interpolation: making prediction within a range of XsExtrapolation: making prediction outside a ran

14、ge of XsTo “adjust or “control for confounding variablesWhat is “Y?an outcome variabledependent variableresponseType of regression depends on type of Ycontinuous (linear regression)binary (logistic regression)time-to-event (Cox regression)rare event or rate (poisson regression)Some motivating exampl

15、esExample 1: Suppose we are interested in studying the relationship between fasting blood glucose (FBG) levels and the number hours per day of aerobic exercise. Let Y denote the fasting blood glucose levelLet X denote the number of hours of exerciseOne may be interested in studying the relationship

16、of Y and XSimple linear regression can be used to quantify this relationshipSome motivating examplesExample 2: Consider expanding example 1 to include other factors that could be related FBG.Let X1 denote hours of exerciseLet X2 denote BMILet X3 indicate if the person has diabetes. . . (other covari

17、ates possible)One may be interested in studying the relationship of all Xs on Y and identifying the “best combination of factorsNote: Some of the Xs may correlated (e.g., exercise and bmi)Multiple (or multivariable, not multivariate) linear regression can be used to quantify this relationshipSome mo

18、tivating examplesExample 3: Myocardial infarction (MI, heart attack) is often a life-altering eventLet Y denote the occurrence (Y = 1) of an MI after treatment, let Y = 0 denote no MILet X1 denote the dosage of aspirin takenLet X2 denote the age of the person. . . (other covariates possible)One may

19、be interested in studying the relationship of all Xs on Y and identifying the “best combination of factorsMultiple LOGISTIC regression can be used to quantify this relationshipMore motivating examplesExample 4: This is an extension of Ex 3: Myocardial infarction. Let the interest be now on when the

20、firstMI occurs instead of “if one occurs.Let Y denote the occurrence (Y = 1) of an MI after treatment, let Y = 0 denote no MI observedLet Time denote the length of time the individual is observedLet X1 denote the dosage of aspirin taken . . . (other covariates possible)Survival Analysis (which, in s

21、ome cases, is a regression model) can be used to quantify this relationship of aspirin on MIMore motivating examplesExample 5: Number of cancer cases in a cityLet Y denote the count (non-negative integer value) of cases of a cancer in a particular region of interestLet X1 denote the region size in t

22、erms of “at risk individualsLet X2 denote the region . . . (other covariates possible)One may be interested in studying the relationship of the region on Y while adjusting for the population at risk sizesPOISSON regression can be used to quantify this relationshipBrief OutlineLinear regression: half

23、 semester (through spring break)Logistic regressionCox regression (survival)Poisson regression Hierarchical regression or ridge regression?Linear RegressionOutcome is a CONTINUOUS variableAssumes association between Y and X is a straight lineAssumes relationship is statistical and not functionalrela

24、tionship is not perfectthere is error or noise or unexplained variationAside:I LOVE graphical displays of dataThis is why regression is especially funthere are lots of neat ways to show your dataprepare yourself for a LOT of scatterplots this semesterGraphical DisplaysScatterplots: show associations

25、 between two variables (usually)Also need to understand each variable by itselfUnivariate data displays are importantBefore performing a regresssion, we shouldidentify any potential skewnessoutliersdiscretenessmultimodalityTop choices for univariate displaysboxplothistogramdensity plotdot plotLinear

26、 regression exampleThe authors conducted a pilot study to assess the use of toenail arsenic concentrations as an indicator of ingestion of arsenic-containing water. Twenty-one participants were interviewed regarding use of their private (unregulated) wells for drinking and cooking, and each provided

27、 a sample of water and toenail clippings. Trace concentrations of arsenic were detected in 15 of the 21 well-water samples and in all toenail clipping samples. Karagas MR, Morris JS, Weiss JE, Spate V, Baskett C, Greenberg ER. Toenail Samples as an Indicator of Drinking Water Arsenic Exposure. Cance

28、r Epidemiology, Biomarkers and Prevention 1996;5:849-852.Purposes of Regression1. Describe associationhypothesis: as arsenic in well water increases, level of arsenic in nails also increases. linear regression can tell ushow much increase in nail level we see on average for a 1 unit increase in well

29、 water level of arsenic2. Predictlinear regresssion can tell us what level of arsenic we would expect in nails for a given level in well water.how precise our estimate of arsenic is for a given level of well water3. Adjustlinear regression can tell uswhat the association between well water arsenic a

30、nd nail arsenic is adjusting for other factors such as age, gender, amount of use of water for cooking, amount of use of water for drinking.BoxplotGraphical DisplaysNailsWaterHistogramBins the datax-axis represents variable valuesy-axis is eitherfrequency of occurrence percentage of occurenceVisual

31、impression can depend on bin widthoften difficult to see details of highly skewed dataHistogramHistogramDensity PlotSmoothed density based on kernel density estimatesCan create similar issues as histogramsmoothing parameter selectioncan affect inferencesCan be problematic for ceiling or floor effect

32、sDensity PlotDot plotMy favorite for small datasetswhen displaying data by groupsAndthe scatterplotMeasuring the association between X and YY is on the verticalX predicts YTerminology: “Regress Y on XY: dependent variable, response, outcomeX: independent variable, covariate, regressor, predictor, co

33、nfounderLinear regression a straight lineimportant!this is key to linear regressionSimple vs. Multiple linear regresssionWhy simple?only one “xwell talk about multiple linear regression laterMultiple regression more than one “Xmore to think about: selection of covariatesNot linear?need to think abou

34、t transformationssometimes linear will do reasonably wellAssociation versus CausationBe careful!Association CausationStatistical relationship does not mean X causes YCould beX causes YY causes Xsomething else causes both X and YX and Y are spuriously associated in your sample of dataExample: vision

35、and number of gray hairsBasic Regression ModelYi is the value of the response variable in the ith individual0 and 1 are parametersXi is a known constant; the value of the covariate in the ith individuali is the random error termLinear in the parametersLinear in the predictorBasic Regression ModelNOT

36、 linear in the parameters:NOT linear in the predictor:Model FeaturesYi is the sum of a constant piece and a random piece:0 + 1Xi is constant piece (recall: x is treated as constant)i is the random pieceAttributes of error termmean of residuals is 0: E(i) = 0constant variance of residuals : 2(i ) = 2

37、 for all iresiduals are uncorrelated: cov(i, j) = 0 for all i, j; i jConsequencesExpected value of responseE(Yi) = 0 + 1XiE(Y) = 0 + 1XVariance of Yi |Xi= 2 Yi and Yj are uncorrelatedProbability Distribution of YFor each level of X, there is a probability distribution of YThe means of the probabilit

38、y distributions vary systematically with X.Parameters0 and 1 are referred to as “regression coefficientsRemember y = mx+b?1 is the slope of the regression linethe expected increase in Y for a 1 unit increase in Xthe expected difference in Y comparing two individuals with Xs that differ by 1 unitExpe

39、cted? Why?Parameters0 is the intercept of the regression lineThe expected value of Y when X = 0Meaningful?when the range of X includes 0, yeswhen the range of X excluded 0, noExample:Y = babys weight in kgX = babys height in cm0 is the expected weight of a baby whose height is 0 cm.SENIC DataWill be

40、 used as a recurring exampleSENIC = Study on the Efficacy of Nosocomial Infection ControlThe primary objective of the SENIC Project was to determine whether infection surveillance and control programs have reduced the rates of nosocomial (hospital-acquired) infection in the United States hospitals.

41、This data set consists of a random sample of 113 hospitals selected from the original 338 hospitals surveyed. Each line of the data set has an ID number and provides information on 11 other variables for a single hospital. The data used here are for the 1975-76 study period.SENIC DataSENIC Simple Li

42、near Regression ExampleHypothesis: The number of beds in a given hospital is associated with the average length of stay.Y = ?X = ?Scatterplot. scatter los bedsStata Regression Results. regress los beds Source | SS df MS Number of obs = 113+ F( 1, 111) = 22.33 Model | 68.5419355 1 68.5419355 Prob F =

43、 0.0000 Residual | 340.668443 111 3.06908508 R-squared = 0.1675+ Adj R-squared = 0.1600 Total | 409.210379 112 3.6536641 Root MSE = 1.7519 los | Coef. Std. Err. t P|t| 95% Conf. Interval+ beds | .0040566 .0008584 4.73 0.000 .0023556 .0057576 _cons | 8.625364 .2720589 31.70 0.000 8.086262 9.164467Ano

44、ther Example: Famous dataFather and sons heights data from Karl Pearson (over 100 years ago in England)1078 pairs of fathers and sons Excerpted 200 pairs for demonstrationHypotheses:there will be a positive association between heights of fathers and their sonsvery tall fathers will tend to have sons that are shorter than they arevery short fathers will tend to have sons that are taller than they areScatterplot of 200 records of father son dataplot(father, son, xlab=Fathers

温馨提示

  • 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
  • 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
  • 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
  • 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
  • 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
  • 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
  • 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

评论

0/150

提交评论