版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领
文档简介
1、BMTRY 701Biostatistical Methods IIElizabeth Garrett-Mayer, PhDAssociate ProfessorDirector of Biostatistics, Hollings Cancer CenterBiostatistical Methods IIDescription: This is a one-semester course intended for graduate students pursuing degrees in biostatistics and related fields such as epidemiolo
2、gy and bioinformatics. Topics covered will include linear, logistic, poisson, and Cox regression.Advanced topics will be included, such as ridge regression or hierarchical linear regression if time permits. Estimation, interpretation, and diagnostic approaches will be discussed. Software instruction
3、 will be provided in class in R. Students will be evaluated via homeworks (55%), two exams (35%) and class participation (10%). This is a four credit course.Biostatistical Methods IITextbooks: (1) Introduction to Linear Regression Analysis (4th Edition). Montgomery, Peck and Vining. Wiley; New York,
4、 2006.(2) Regression with Modeling Strategies: With Applications to Linear Models, Logistic Regression, and Survival Analysis. Frank E. Harrell, Jr. Springer; New York, 2001.Prerequisites: Biometry 700Course Objectives: Upon successful completion of the course, the student will be able toApply, inte
5、rpret and diagnose linear regression modelsApply, interpret and diagnose logistic, poisson and Cox regresssion modelsBiostatistical Methods IIInstructor:Elizabeth Garrett-MayerWebsite:/elg26/teaching/methods2.2010/methods2.2010.htmContact Info:Hollings Cancer Center, Rm 118Ggarrettm (preferred mode
6、of contact is email)792-7764Time:Mondays and Wednesdays, 1:30-3:30Location:Cannon 301, Room 305VOffice Hours:Tuesdays 2 :00 3:30pmBiostatistical Methods IILecture schedule is on the websiteSecond time teaching this classNew textbooks this yearsyllabus is a work in progresstiming of topics subject to
7、 changelectures may appear on website last-minuteComputingRintegrated into lecture timeHomeworks, articles, datasets will also be posted to websitesome/most problems will be from textbooksome datasets will be from R library If you want printed versions of lectures:download and print prior to lecture
8、; ORwork interactively on your laptop during classWe will take a break about halfway through each lecture ExpectationsAcademicParticipate in class discussionsInvest resources in YOUR educationComplete homework assignments on timeThe results of the homework should be communicated so that a person kno
9、wledgeable in the methodology could reproduce your results.Create your own study groupsChallenge one anothereveryone needs to contributeyou may do homeworks together, but everyone must turn in his/her own homework. written sections of homework should be independently developedGeneralBe on time to cl
10、assBe discrete with interruptions (pages, phones, etc.)Do NOT turn in raw computer outputOther ExpectationsKnowledge of Methods I!You should be very familiar withconfidence intervalshypothesis testingt-testsZ-testsgraphical displays of dataexploratory data analysisestimating means, medians, quantile
11、s of dataestimating variances, standard deviationsAbout the instructorB.A. from Bowdoin College, 1994Double Major in Mathematics and Economics Minor in ClassicsPh.D. in Biostatistics from Johns Hopkins, 2000Dissertation research in latent class models, Adviser Scott ZegerAssistant Professor in Oncol
12、ogy and Biostatistics at JHU, 2000-2007Taught course in Statistics for Psychosocial Research for 8 yearsApplied Research Areas:oncologyBiostats Research Areas: latent variable modelingclass discovery in microarray datamethodology for early phase oncology clinical trialsCame to MUSC in Feb 2007Comput
13、ingWho knows what?Who WANTS to know what?Who will bring a laptop to class?What software do you have and/or prefer?RegressionPurposes of RegresssionDescribe association between Y and XsMake predictions:Interpolation: making prediction within a range of XsExtrapolation: making prediction outside a ran
14、ge of XsTo “adjust or “control for confounding variablesWhat is “Y?an outcome variabledependent variableresponseType of regression depends on type of Ycontinuous (linear regression)binary (logistic regression)time-to-event (Cox regression)rare event or rate (poisson regression)Some motivating exampl
15、esExample 1: Suppose we are interested in studying the relationship between fasting blood glucose (FBG) levels and the number hours per day of aerobic exercise. Let Y denote the fasting blood glucose levelLet X denote the number of hours of exerciseOne may be interested in studying the relationship
16、of Y and XSimple linear regression can be used to quantify this relationshipSome motivating examplesExample 2: Consider expanding example 1 to include other factors that could be related FBG.Let X1 denote hours of exerciseLet X2 denote BMILet X3 indicate if the person has diabetes. . . (other covari
17、ates possible)One may be interested in studying the relationship of all Xs on Y and identifying the “best combination of factorsNote: Some of the Xs may correlated (e.g., exercise and bmi)Multiple (or multivariable, not multivariate) linear regression can be used to quantify this relationshipSome mo
18、tivating examplesExample 3: Myocardial infarction (MI, heart attack) is often a life-altering eventLet Y denote the occurrence (Y = 1) of an MI after treatment, let Y = 0 denote no MILet X1 denote the dosage of aspirin takenLet X2 denote the age of the person. . . (other covariates possible)One may
19、be interested in studying the relationship of all Xs on Y and identifying the “best combination of factorsMultiple LOGISTIC regression can be used to quantify this relationshipMore motivating examplesExample 4: This is an extension of Ex 3: Myocardial infarction. Let the interest be now on when the
20、firstMI occurs instead of “if one occurs.Let Y denote the occurrence (Y = 1) of an MI after treatment, let Y = 0 denote no MI observedLet Time denote the length of time the individual is observedLet X1 denote the dosage of aspirin taken . . . (other covariates possible)Survival Analysis (which, in s
21、ome cases, is a regression model) can be used to quantify this relationship of aspirin on MIMore motivating examplesExample 5: Number of cancer cases in a cityLet Y denote the count (non-negative integer value) of cases of a cancer in a particular region of interestLet X1 denote the region size in t
22、erms of “at risk individualsLet X2 denote the region . . . (other covariates possible)One may be interested in studying the relationship of the region on Y while adjusting for the population at risk sizesPOISSON regression can be used to quantify this relationshipBrief OutlineLinear regression: half
23、 semester (through spring break)Logistic regressionCox regression (survival)Poisson regression Hierarchical regression or ridge regression?Linear RegressionOutcome is a CONTINUOUS variableAssumes association between Y and X is a straight lineAssumes relationship is statistical and not functionalrela
24、tionship is not perfectthere is error or noise or unexplained variationAside:I LOVE graphical displays of dataThis is why regression is especially funthere are lots of neat ways to show your dataprepare yourself for a LOT of scatterplots this semesterGraphical DisplaysScatterplots: show associations
25、 between two variables (usually)Also need to understand each variable by itselfUnivariate data displays are importantBefore performing a regresssion, we shouldidentify any potential skewnessoutliersdiscretenessmultimodalityTop choices for univariate displaysboxplothistogramdensity plotdot plotLinear
26、 regression exampleThe authors conducted a pilot study to assess the use of toenail arsenic concentrations as an indicator of ingestion of arsenic-containing water. Twenty-one participants were interviewed regarding use of their private (unregulated) wells for drinking and cooking, and each provided
27、 a sample of water and toenail clippings. Trace concentrations of arsenic were detected in 15 of the 21 well-water samples and in all toenail clipping samples. Karagas MR, Morris JS, Weiss JE, Spate V, Baskett C, Greenberg ER. Toenail Samples as an Indicator of Drinking Water Arsenic Exposure. Cance
28、r Epidemiology, Biomarkers and Prevention 1996;5:849-852.Purposes of Regression1. Describe associationhypothesis: as arsenic in well water increases, level of arsenic in nails also increases. linear regression can tell ushow much increase in nail level we see on average for a 1 unit increase in well
29、 water level of arsenic2. Predictlinear regresssion can tell us what level of arsenic we would expect in nails for a given level in well water.how precise our estimate of arsenic is for a given level of well water3. Adjustlinear regression can tell uswhat the association between well water arsenic a
30、nd nail arsenic is adjusting for other factors such as age, gender, amount of use of water for cooking, amount of use of water for drinking.BoxplotGraphical DisplaysNailsWaterHistogramBins the datax-axis represents variable valuesy-axis is eitherfrequency of occurrence percentage of occurenceVisual
31、impression can depend on bin widthoften difficult to see details of highly skewed dataHistogramHistogramDensity PlotSmoothed density based on kernel density estimatesCan create similar issues as histogramsmoothing parameter selectioncan affect inferencesCan be problematic for ceiling or floor effect
32、sDensity PlotDot plotMy favorite for small datasetswhen displaying data by groupsAndthe scatterplotMeasuring the association between X and YY is on the verticalX predicts YTerminology: “Regress Y on XY: dependent variable, response, outcomeX: independent variable, covariate, regressor, predictor, co
33、nfounderLinear regression a straight lineimportant!this is key to linear regressionSimple vs. Multiple linear regresssionWhy simple?only one “xwell talk about multiple linear regression laterMultiple regression more than one “Xmore to think about: selection of covariatesNot linear?need to think abou
34、t transformationssometimes linear will do reasonably wellAssociation versus CausationBe careful!Association CausationStatistical relationship does not mean X causes YCould beX causes YY causes Xsomething else causes both X and YX and Y are spuriously associated in your sample of dataExample: vision
35、and number of gray hairsBasic Regression ModelYi is the value of the response variable in the ith individual0 and 1 are parametersXi is a known constant; the value of the covariate in the ith individuali is the random error termLinear in the parametersLinear in the predictorBasic Regression ModelNOT
36、 linear in the parameters:NOT linear in the predictor:Model FeaturesYi is the sum of a constant piece and a random piece:0 + 1Xi is constant piece (recall: x is treated as constant)i is the random pieceAttributes of error termmean of residuals is 0: E(i) = 0constant variance of residuals : 2(i ) = 2
37、 for all iresiduals are uncorrelated: cov(i, j) = 0 for all i, j; i jConsequencesExpected value of responseE(Yi) = 0 + 1XiE(Y) = 0 + 1XVariance of Yi |Xi= 2 Yi and Yj are uncorrelatedProbability Distribution of YFor each level of X, there is a probability distribution of YThe means of the probabilit
38、y distributions vary systematically with X.Parameters0 and 1 are referred to as “regression coefficientsRemember y = mx+b?1 is the slope of the regression linethe expected increase in Y for a 1 unit increase in Xthe expected difference in Y comparing two individuals with Xs that differ by 1 unitExpe
39、cted? Why?Parameters0 is the intercept of the regression lineThe expected value of Y when X = 0Meaningful?when the range of X includes 0, yeswhen the range of X excluded 0, noExample:Y = babys weight in kgX = babys height in cm0 is the expected weight of a baby whose height is 0 cm.SENIC DataWill be
40、 used as a recurring exampleSENIC = Study on the Efficacy of Nosocomial Infection ControlThe primary objective of the SENIC Project was to determine whether infection surveillance and control programs have reduced the rates of nosocomial (hospital-acquired) infection in the United States hospitals.
41、This data set consists of a random sample of 113 hospitals selected from the original 338 hospitals surveyed. Each line of the data set has an ID number and provides information on 11 other variables for a single hospital. The data used here are for the 1975-76 study period.SENIC DataSENIC Simple Li
42、near Regression ExampleHypothesis: The number of beds in a given hospital is associated with the average length of stay.Y = ?X = ?Scatterplot. scatter los bedsStata Regression Results. regress los beds Source | SS df MS Number of obs = 113+ F( 1, 111) = 22.33 Model | 68.5419355 1 68.5419355 Prob F =
43、 0.0000 Residual | 340.668443 111 3.06908508 R-squared = 0.1675+ Adj R-squared = 0.1600 Total | 409.210379 112 3.6536641 Root MSE = 1.7519 los | Coef. Std. Err. t P|t| 95% Conf. Interval+ beds | .0040566 .0008584 4.73 0.000 .0023556 .0057576 _cons | 8.625364 .2720589 31.70 0.000 8.086262 9.164467Ano
44、ther Example: Famous dataFather and sons heights data from Karl Pearson (over 100 years ago in England)1078 pairs of fathers and sons Excerpted 200 pairs for demonstrationHypotheses:there will be a positive association between heights of fathers and their sonsvery tall fathers will tend to have sons that are shorter than they arevery short fathers will tend to have sons that are taller than they areScatterplot of 200 records of father son dataplot(father, son, xlab=Fathers
温馨提示
- 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
- 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
- 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
- 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
- 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
- 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
- 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。
最新文档
- 2024标准借款合同书
- 部编版九年级上册语文《艾青诗选》知识点和练习
- 2024版回迁房贷款合同
- 2024有孩子的离婚协议范文
- 父母赠与儿媳房产合同(2024版)
- 集装箱运输的合同(2024版)
- 大学社团面试的自我介绍范文(13篇)
- 端午节国旗下讲话小学
- 大学期末总结与反思大学期末的总结
- 2023年融资担保行业市场环境分析
- 2024-2030年中国自动病床行业市场发展趋势与前景展望战略分析报告
- 浙江省小升初真题汇编:填空题(三)专项训练-2023-2024学年六年级下册数学小升初真题高频易错典型题 人教版
- 2024年四川省凉山州中考数学真题(原卷版)
- 2023-2024学年人教版数学七年级下册期末综合模拟测试
- 医院新技术开展总结及整改措施
- 浙江杭州市钱塘区市场监督管理局编外人员招考聘用24人公开引进高层次人才和急需紧缺人才笔试参考题库(共500题)答案详解版
- 老旧排水管网改造 投标方案(技术方案)
- 学校改造、维修、修缮项目可行性研究报告
- 2024年3月17日贵州省直遴选笔试真题及解析
- 广东省江门市重点中学2023-2024学年小升初分班考数学押题卷(人教版)
- 四川省成都市2023-2024学年八年级上学期期末练习卷英语试题
评论
0/150
提交评论