版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领
文档简介
1、 17/17 Generalized additive models with integrated smoothness estimation 广义加性模型与集成的平滑估计描述DescriptionFits a generalized additive model (GAM) to data, the term GAM being taken to include any quadratically penalized GLM. The degree of smoothness of model terms is estimated as part of fitting. gam can a
2、lso fit any GLM subject to multiple quadratic penalties (includingestimation of degree of penalization). Isotropic or scale invariant smooths of any number of variables are available as model terms, as are linear functionals of such smooths; confidence/credible intervals are readily available for an
3、y quantity predicted using a fitted model; gam is extendable: users can add smooths.适合一个广义相加模型(GAM)的数据,“GAM”被视为包括任何二次处罚GLM。模型计算的平滑度估计作为拟合的一部分。 gam也可以适用于任何GLM多个二次处罚(包括估计程度的处罚)。各向同性或规模不变平滑的任意数量的变量的模型计算,这样的线性泛函平滑的信心/可信区间都是现成的使用拟合模型预测任何数量,“gam是可扩展的:用户可以添加平滑。Smooth terms are represented using penalized r
4、egression splines (or similar smoothers) with smoothing parameters selected by GCV/UBRE/AIC/REML or by regression splines with fixed degrees of freedom (mixtures of the two are permitted). Multi-dimensional smooths areavailable using penalized thin plate regression splines (isotropic) or tensor prod
5、uct splines(when an isotropic smooth is inappropriate). For an overview of the smooths available see smooth.terms.For more on specifying models see gam.models, random.effects and linear.functional.terms. For more on model selection see gam.selection. Do read gam.check and choose.k.平滑术语表示使用惩罚回归花键(或类似
6、的平滑)与由GCV / UBRE的/ AIC / REML或由固定的自由度(两个的混合物被允许)的的回归花键与选择的平滑化参数。多维平滑可使用惩罚薄板回归样条曲线(各向同性)或X量积样条线(各向同性的光滑是不恰当的)。的平滑的概述,请参阅smooth.terms。欲了解更多有关指定模型gam.models,random.effects和linear.functional.terms。模型选择的更多信息,请参阅gam.selection。不要读为gam.check和choose.k。See gam from package gam, for GAMs via the original Hasti
7、e and Tibshirani approach (see details for differences to this implementation).见GAM包gam,GAMS通过原来的Hastie和Tibshirani方法(详情请参阅本实施方案的差异)。For very large datasets see bam, for mixed GAM see gamm and random.effects.对于非常大的数据集,请参阅bam,混合GAM看到gamm和random.effects。用法Usagegam(formula,family=gaussian(),data=list(),
8、weights=NULL,subset=NULL, na.action,offset=NULL,method=GCV.Cp, optimizer=c(outer,newton),control=list(),scale=0, select=FALSE,knots=NULL,sp=NULL,min.sp=NULL,H=NULL,gamma=1, fit=TRUE,paraPen=NULL,G=NULL,in.out,.)参数Arguments参数:formulaA GAM formula (see formula.gam and also gam.models).This is exactly
9、like the formula for a GLM except that smooth terms, s and te can be addedto the right hand side to specify that the linear predictor depends on smooth functions of predictors(or linear functionals of these). 一个GAM的公式(见formula.gam和gam.models)。这是完全一样的公式,除非GLM那光滑的条款,s和te可以被添加到指定的线性预测依赖于光滑函数的预测(或线性泛函的右
10、手边这些)。参数:familyThis is a family object specifying the distribution and link to use in fitting etc. See glm and family for more details. A negative binomial family is provided: see negbin.quasi families actually result in the use of extended quasi-likelihoodif method is set to a RE/ML method (McCulla
11、gh and Nelder, 1989, 9.6). 这是一个家庭对象指定的分配和使用配件等glm和family更多的细节。负二项分布家庭提供:看到negbin。 quasi家庭实际上导致在使用扩展的拟似然method设置为一个RE / ML方法(McCullagh和Nelder,1989年,9.6)。参数:dataA data frame or list containing the model response variable andcovariates required by the formula. By default the variables are takenfrom env
12、ironment(formula): typically the environment fromwhich gam is called.式所需的一个数据框或列表包含模型响应变量,协变量。默认情况下,变量从environment(formula):gam被称为典型的环境。参数:weightsprior weights on the data.现有的数据上的权重。参数:subsetan optional vector specifying a subset of observations to be used in the fitting process.一个可选的矢量指定的装配过程中可以使用的
13、观测值的一个子集。参数:na.actiona function which indicates what should happen when the data contain NAs.The default is set by the na.action setting of options, and is na.fail if that is unset.The “factory-fresh” default is na.omit.一个函数,它表示时会发生什么数据包含“NA”。默认设置是“na.action设置选项,na.fail”如果是没有设置的。 “工厂新鲜的”默认“na.omit。参
14、数:offsetCan be used to supply a model offset for use in fitting. Note that this offset will always be pletely ignored when predicting, unlike an offsetincluded in formula: this conforms to the behaviour of lm and glm.可以用来提供一个模型偏移量用于接头。请注意,此偏移量总是被完全忽略当预测,不像一个偏移量包含在formula:这符合的lm和glm的行为。参数:controlA li
15、st of fit control parameters to replace defaults returned bygam.control. Values not set assume default values.一个合适的控制参数,以取代默认值返回gam.control。未设置假设值默认值。参数:methodThe smoothing parameter estimation method. GCV.Cp to use GCV for unknown scale parameter and Mallows Cp/UBRE/AIC for known scale. GACV.Cp is
16、equivalent, but using GACV in place of GCV. REMLfor REML estimation, including of unknown scale, P-REML for REML estimation, but using a Pearson estimateof the scale. ML and P-ML are similar, but using maximum likelihood in place of REML.平滑参数估计方法。 GCV.Cp使用GCV对未知的尺度参数和锦葵“的CP / UBRE / AIC已知的规模。 GACV.C
17、p是等价的,但使用的GCV GACV的地方。 REMLREML估计,包括不明刻度,P-REMLREML估计,但使用的Pearson估计规模。 ML和P-ML是相似的,但用最大似然的地方REML。参数:optimizerAn array specifying the numerical optimization method to use to optimize the smoothingparameter estimation criterion (given by method). perf for performance iteration. outerfor the more stabl
18、e direct approach. outer can use several alternative optimizers, specified in thesecond element of optimizer: newton (default), bfgs, optim, nlmand nlm.fd (the latter is based entirely on finite differenced derivatives and is very slow).一个数组,指定的数值优化方法,使用优化的平滑参数估计准则(method)。 perf性能迭代。 outer更稳定的直接方法。
19、outer可以使用optimizer:newton(默认),bfgs,optim,nlm和第二个元素中指定的几种可供选择的优化, nlm.fd(后者则是完全基于上有限差分衍生工具,很慢)。参数:scaleIf this is positive then it is taken as the known scale parameter. Negative signals that thescale parameter is unknown. 0 signals that the scale parameter is 1for Poisson and binomial and unknown ot
20、herwise.Note that (RE)ML methods can only work with scale parameter 1 for the Poisson and binomial cases. 如果这是正的,那么它被当作已知尺度参数。负信号,规模参数是未知的。 0信号泊松分布和二项分布和未知的,否则,尺度参数为1。需要注意的是(RE)的ML方法只能工作与尺度参数的泊松分布和二项式情况下。参数:selectIf this is TRUE then gam can add an extra penalty to each term sothat it can be penaliz
21、ed to zero.This means that the smoothing parameter estimation that ispart of fitting can pletely remove terms from the model. If the correspondingsmoothing parameter is estimated as zero then the extra penalty has no effect.如果这是TRUE然后gam可以添加一个额外的处罚,以每学期,以便它可以被扣分零。这意味着平滑参数估计是拟合的一部分的,可以完全除去从模型中的条款。如果相
22、应的平滑参数估计值为零,那么额外的罚款没有任何效果。参数:knotsthis is an optional list containing user specified knot values to be used for basis construction.For most bases the user simply supplies the knots to be used, which must match up with the k value supplied (note that the number of knots is not always just k).See tprs
23、 for what happens in the tp/ts case.Different terms can use different numbers of knots, unless they share a covariate.这是一个可选的列表,其中包含用户指定的节点值用于基础建设。对于最基础的用户只需提供要使用的节,它必须匹配的k值(附注的节点数不是永远只是k)。见tprstp/ts情况下会发生什么。不同的术语可以使用不同的节数,除非他们共享一个协。参数:spA vector of smoothing parameters can be provided here. Smoothi
24、ng parameters must be supplied in the order that the smooth terms appear in the modelformula. Negative elements indicate that the parameter should be estimated, and hence a mixtureof fixed and estimated parameters is possible. If smooths share smoothing parameters then length(sp)must correspond to t
25、he number of underlying smoothing parameters.平滑化参数的一种向量,可以提供在这里。必须提供平滑参数的顺序,顺利的词出现在模型公式。负性元件表明应当估计的参数,因此,固定和估计参数的混合物是可能的。如果平滑份额平滑参数,那么length(sp)必须符合相关的平滑参数的数量。参数:min.spLower bounds can be supplied for the smoothing parameters. Note that if this option is used then the smoothing parameters full.sp, i
26、n thereturned object, will need to be added to what is supplied here to get thesmoothing parameters actually multiplying the penalties. length(min.sp) shouldalways be the same as the total number of penalties (so it may be longer than sp, if smooths share smoothing parameters).下界能够供给的平滑化参数。请注意,如果使用此
27、选项,然后平滑参数full.sp,返回的对象中,将需要添加什么是这里提供的平滑参数乘以处罚。 length(min.sp)应始终是相同的刑罚(所以它可能是长于sp,如果平滑份额平滑参数)的总人数。参数:HA user supplied fixed quadratic penalty on the parameters of theGAM can be supplied, with this as its coefficient matrix. A mon use of this term isto add a ridge penalty to the parameters of the GAM
28、 in circumstances in which the model is close to un-identifiable on the scale of the linear predictor, but perfectly well defined on the response scale.用户提供的固定二次罚的GAM的参数可以提供,这是系数矩阵。使用这一术语是一个常见的添加脊处罚,GAM的情况下,该模型是未识别的线性预测的规模,但完全定义的响应规模的参数。参数:gammaIt is sometimes useful to inflate the model degrees off
29、reedom in the GCV or UBRE/AIC score by a constant multiplier. This allowssuch a multiplier to be supplied. 有时它是有用的GCV或UBRE的/ AIC得分由一个常乘数充气模型的自由度。这允许将要提供这样一个乘法器。参数:fitIf this argument is TRUE then gam sets up the model and fits it, but if it is FALSE then the model is set up and an object G containin
30、g what would be required to fit is returned is returned. See argument G.如果这种说法是TRUE然后gam设置模式和适合它,但如果它是FALSE然后对模型进行设置和对象G包含将需要,以适应返回返回。请参阅参数G。参数:paraPenoptional list specifying any penalties to be applied to parametric model terms.gam.models explains more.可选的列表,指定参数模型计算被应用到任何处罚。 gam.models解释更多。参数:GUs
31、ually NULL, but may contain the object returned by a previous call to gam withfit=FALSE, in which case all other arguments are ignored except for gamma, in.out, scale, control, method optimizer and fit.通常是NULL,但可能包含对象返回以前调用gam的fit=FALSE,在这种情况下,所有其它参数将被忽略,除了gamma,in.out ,scale,control,methodoptimizer
32、和fit。参数:in.outoptional list for initializing outer iteration. If supplied then this must contain two elements: sp should be an array of initialization values for all smoothing parameters (there must be a value for all smoothing parameters, whether fixed or to be estimated, but those for fixed s.p.s
33、are not used); scale is the typical scale of the GCV/UBRE function, for passing to the outer optimizer, or the the initial value of the scale parameter, if this is to be estimated by RE/ML.初始化外部循环的可选列表。如果提供,则必须包含两个要素:sp应该是一个数组初始化所有的平滑参数值(是固定的还是要估计,必须有所有的平滑参数的值,而固定SPS不使用的话);scale是GCV / UBRE功能的的典型尺度,用
34、于传递到外的优化器,或尺度参数的初始值,如果这是要估计的RE / ML。参数:.further arguments forpassing on e.g. to gam.fit (such as mustart).在例如通过进一步的论据gam.fit(如mustart)。DetailsDetailsA generalized additive model (GAM) is a generalized linear model (GLM) in which the linearpredictor is given by a user specified sum of smooth function
35、s of the covariates plus aconventional parametric ponent of the linear predictor. A simple example is:一个广义相加模型(GAM)是一个广义线性模型(GLM)的线性预测是由用户指定的协变量的函数平滑,再加上传统的参数化组件的线性预测的总和。一个简单的例子是:where the (independent) response variables y_iPoi, and f_1 and f_2 are smooth functions of covariates x_1 andx_2. The log
36、 is an example of a link function.(独立的)响应变量y_iPoi和f_1和f_2是光滑函数的协变量x_1和x_2。的log的一个例子是一个函数。If absolutely any smooth functions were allowed in model fitting then maximum likelihoodestimation of such models would invariably result in plex overfitting estimates off_1and f_2. For this reason the models ar
37、e usually fit bypenalized likelihoodmaximization, in which the model (negative log) likelihood is modified by the addition ofa penalty for each smooth function, penalizing its wiggliness. To control the tradeoffbetween penalizing wiggliness and penalizing badness of fit each penalty is multiplied by
38、an associated smoothing parameter: how to estimate these parameters, andhow to practically represent the smooth functions are the main statistical questionsintroduced by moving from GLMs to GAMs.如果确实被允许在任何光滑的函数模型拟合,最大似然估计这些模型往往会导致复杂的过拟合估计f_1和f_2。出于这个原因的模型通常是适合由惩罚的可能性最大化,其中模型(负对数)的可能性被修改通过加入每个平滑函数罚款,
39、惩罚“wiggliness。要控制,之间的的惩罚wiggliness和惩罚不良适合每个罚球乘以相关的平滑参数:如何估计这些参数的权衡,以及如何在实践中代表顺利的功能是主要的统计问题,介绍了从GLMS GAMS。The mgcv implementation of gam represents the smooth functions usingpenalized regression splines, and by default uses basis functions for these splines thatare designed to be optimal, given the n
40、umber basis functions used. The smooth terms can befunctions of any number of covariates and the user has some control over how smoothness ofthe functions is measured.mgcvgam实施顺利使用惩罚的回归样条曲线的功能,在默认情况下使用这些曲线的设计是最佳的,因为数基函数的基础功能。光滑的术语可以是任意数量的协变量的函数,并且用户具有一定的控制的函数的平滑度如何测量。gam in mgcv solves the smoothing
41、 parameter estimation problem by using theGeneralized Cross Validation (GCV) criteriongam在mgcv解决了平滑参数估计问题通过使用广义交叉验证(GCV)标准,or an Un-Biased Risk Estimator (UBRE )criterion或无偏风险估计(UBRE)标准where D is the deviance, n the number of data, s the scale parameter andDoF the effective degrees of freedom of the
42、 model. Notice that UBRE is effectively just AIC rescaled, but is only used when s is known.其中D是越轨行为,n数据的数量,s的尺度参数和DoF有效度模型的自由。请注意,UBRE实际上只是AIC重新调整,但只用在s被称为。Alternatives are GACV, or a Laplace approximation to REML. There is some evidence that the latter may actually be the most effective choice.替代品
43、GACV,或Laplace逼近REML。有一些证据表明,后者实际上可能是最有效的选择。Smoothing parameters are chosen tominimize the GCV, UBRE/AIC, GACV or REML scores for the model, and the main putational challenge solvedby the mgcv package is to do this efficiently and reliably. Various alternative numerical methods are provided which can
44、 be set by argument optimizer.平滑化参数的选择,以尽量减少GCV,UBRE / AIC,GACV或模型REML分数,和求解的主要计算挑战mgcv包是有效和可靠地做到这一点。各种替代数值方法提供了可以设置的参数optimizer。Broadly gam works by first constructing basis functions and one or more quadratic penaltycoefficient matrices for each smooth term in the model formula, obtaining a model
45、matrix forthe strictly parametric part of the model formula, and bining these to obtain aplete model matrix (/design matrix) and a set of penalty matrices for the smooth terms.Some linear identifiability constraints are also obtained at this point. The model isfit using gam.fit, a modification of gl
46、m.fit. The GAMpenalized likelihood maximization problem is solved by Penalized IterativelyReweightedLeast Squares (P-IRLS) (see e.g. Wood 2000).Smoothing parameter selection is integrated in one of two ways. (i) Performance iteration uses the fact that at each P-IRLS iteration a penalizedweighted le
47、ast squares problem is solved, and the smoothing parameters of that problem canestimated by GCV or UBRE. Eventually, in most cases, both model parameter estimates and smoothingparameter estimates converge. (ii) Alternatively the P-IRLS scheme is iterated to convergence for each trial set of smoothin
48、g parameters, and GCV, UBRE or REML scores are only evaluated on convergence - optimization is then outer to the P-IRLS loop: in this case the P-IRLS iteration has to be differentiated, to facilitate optimization, and gam.fit3 is used in place of gam.fit. The default is the second method, outer iter
49、ation.广义gam的工作原理是第一构造的基础功能和一个或多个二次罚系数矩阵中的模型公式为每个平滑内,获得模型矩阵模型公式为严格的参数的一部分,并结合这些以获得一个完整的模型/设计矩阵(矩阵)和刑罚矩阵顺利条款的一组。一些线性辨识性约束在这一点上也能获得。该模型是适合使用gam.fit,glm.fit修改。的的GAM处罚的可能性最大化问题得到解决,由受罚迭代加权最小二乘法(P-IRLS)(如木材2000)。平滑参数的选择是集成在以下两种方式之一。 (I)的性能迭代“,在每个P-IRLS迭代一个惩罚加权最小二乘问题的解决,这个问题可以平滑参数估计GCV或UBRE所使用的事实。最终,在大多数情况
50、下,两个模型参数的估计和平滑参数估计值的收敛。 (2)或者的P-IRLS计划的迭代收敛,为每个审判平滑参数,GCV,UBRE或REML分数的评价收敛 - “外部”是的P-IRLS循环的优化:在本情况下,P-IRLS迭代以加以区分,以方便优化,和gam.fit3被用于代替gam.fit。默认的是第二种方法,外部循环。Several alternative basis-penalty typesare built in for representing model smooths, but alternatives can easily be added (see smooth.termsfor
51、an overview and smooth.construct for how to add smooth classes). In practice thedefault basis is usually the best choice, but the choice of the basis dimension (k in thes and te terms) is something that should be considered carefully (the exact value is not critical, but it is important not to make
52、it restrictively small, nor very large and putationally costly). The basis shouldbe chosen to be larger than is believed to be necessary to approximate the smooth function concerned.The effective degrees of freedom for the smooth will then be controlled by the smoothing penalty onthe term, and (usua
53、lly) selected automatically (with an upper limit set by k-1 or occasionally k). Of coursethe k should not be made too large, or putation will be slow (or in extreme cases there will be morecoefficients to estimate than there are data).几种可供选择的依据,处罚类型建立模型平滑,但替代品可以很容易地添加(见smooth.terms的概述和smooth.constru
54、ct如何添加平滑的类)。在实践中,默认的基础通常是最好的选择,但选择的基础尺寸(ks和te条款)的东西,应该仔细考虑(确切值不是关键的,但重要的是不要使它限定小,也不是非常大的和计算昂贵)。应选择的基础上,要大于被认为是必要的近似的平滑函数有关。将被控制的有效程度的自由的顺利平滑的术语刑罚,和(通常情况下)自动选择(上限设定k-1或偶尔k)。当然,k不应该过大,或计算将是缓慢的(或在极端的情况下,将会有更多的系数估计比有数据)。Note that gam assumes a very inclusive definition of what counts as a GAM:basically
55、any penalized GLM can be used: to this end gam allows the non smooth modelponents to be penalized via argument paraPen and allows the linear predictor to depend ongeneral linear functionals of smooths, via the summation convention mechanism described inlinear.functional.terms.请注意,gam承担的最重要的一个GAM一个很大
56、的包容性的定义:基本上,可用于任何处罚GLM:为此gam允许非光滑模型组件被处罚通过参数paraPen和允许的线性预测依赖上一般线性泛函的平滑,通过的求和约定机制,在linear.functional.terms。Details of the default underlying fitting methods are given in Wood (2011 and 2004). Some alternative methods are discussed in Wood (2000 and 2006).相关拟合方法的默认木材(2011年和2004年)。一些替代方法进行了探讨伍德(2000年至
57、2006年)。gam() is not a clone of Trevor Hasties oroginal (as supplied in S-PLUS or package gam) The major differences are (i) that by default estimation of the degree of smoothness of model terms is part of model fitting, (ii) a Bayesian approach to variance estimation is employed that makes for easie
58、r confidence interval calculation (with good coverage probabilities), (iii) that the model can depend on any (bounded) linear functional of smooth terms, (iv) the parametric part of the model can be penalized,(v) simple random effects can be incorporated, and(vi) the facilities for incorporating smo
59、oths of more than one variable are different: specifically there are no lo smooths, but instead (a) s terms can have more than one argument, implying an isotropic smooth and (b) te or t2 smooths are provided as an effective means for modelling smooth interactions of any number of variables via scale
60、 invariant tensor product smooths. Splines on the sphere, Duchon splinesand Gaussian Markov Random Fields are also available. See gamfrom package gam, for GAMs via the original Hastie and Tibshirani approach.gam()是不是克隆的特雷弗黑斯蒂的oroginal(提供S-PLUS或包GAM)的主要区别是:(i)默认情况下,估计模型的平滑程度是模型拟合,( II)的方差估计采用贝叶斯方法,使得
温馨提示
- 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
- 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
- 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
- 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
- 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
- 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
- 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。
最新文档
- 2023年北京市石景山初三二模历史试卷及答案
- 天津市外国语大学附属滨海外国语学校2024-2025学年八年级上学期期中考试数学试卷(无答案)
- 2019-2020学年八年级上学期期末考试常考题型汇编(完型)学生版
- 仓储物流中心装修监理
- 体育场馆装修工程垃圾清运
- 化妆品批量配送承揽协议
- 2024工程设计及施工合同
- 服装辅料配送合同审核表
- 国际品牌展厅装修协议
- 化工原料运输包车合同范本
- 2023-2024学年北京市朝阳外国语学校九年级(上)期中数学试卷【含解析】
- 备品备件保障方案
- 完整版抖音运营推广方案课件
- 人教版六上数学第六单元《百分数》教案(含单元计划)
- 中国邮政社招笔试题库
- 2023年山东省济南市天桥区无影山街道社区工作者招聘笔试题及答案
- 纸巾厂合作合同协议书
- 2024年典型事故案例警示教育手册15例
- 2024-2030年中国智能厨房行业市场发展趋势与前景展望战略分析报告
- 高一历史(中外历史纲要上册)期中测试卷及答案
- 建筑工程竣工交付方案
评论
0/150
提交评论