版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领
文档简介
1、GEOGRAPHICALLY WEIGHTED REGRESSIONWHITE PAPERMARTIN CHARLTONA STEWART FOTHERINGHAMNational Ccntae for GcocoinputatiouNational University of Ireland MaynoothMayiiootli. Co Kildare、IRELANDMaicli 3 2009scie ng foundation ire la nd forduiresehtirejnrTlic authors gratcfxilly acknowledge support fiom a St
2、rategic Research Cluster grant (07/SRC/I1168) by Science Foiuidatiou Iixland mider tlie National Development Plan.ContentsRegression1Regression with Spatial Data3Geograpliically Weighted Regression (GWR)5Outputs from GWR9Interpreting Parameter Estimates10Extensions to GWR12References13RegressionRegr
3、ession encompasses a wide range of methods for modelling the relationship between a dependent variable and a set of one or more independent variables. Tlie dependent variable is sometimes known as the y-van able, tlie response variable or tlie regressand Tlie independent variables are sometimes know
4、n as x-variables, predictor vaiiables, or regixssors A regression model is expressed as an equation.Ill its simplest fonn a linear regression model can take the fonn= Pq + Ax, + 5 for 1 nIll tliis equation is tlie response variable, here measured at sonic location z, xx is tlie independent variable,
5、 Sx is the ciTor tern】、and and px arc parameters The term coefficient is sometimes used instead of parameter wliich arc to be n“estimated such tliat the value of,(片 -)- is niiiiiniised over the n observations in the !-ldataset Tlie y .is tlie predicted or fitted value for tlie ith observation,given
6、tlie /tli value of x. Tlie tenn 一 $Jis known as the residual for the /th observation, and the residuals should be both independent and di awii identically from a Nomial Distnbutioii with a mean of zero. Such a model is usually fitted using a procedure known as Ordinaiy Least Squai es (OLS).More gene
7、rally, a multiple linear regression model may be written:片=00 »11 - 1 卩心 « £i For l-l . 77where the predictions of the dependent variable arc obtained tluough a linear combination of the independent variables The OLS estimator takes tlie fonn:P=(XTX)AXTywhere P is the vector of estima
8、ted parameters, X is the design matrix wliich contains the values of tlie iiidepeiident variables and a colunui of is the vector of obseivcd values, and (0” is the inverse of tlie vanaiicc-covailancc matrixSometimes it is desirable to weight the observations in a regression, for example, different l
9、evels of data imceitainty Tlic weights arc placed in tlie leading diagonal of a square matrix W and the estimator is altered to include the weighting:p = (XTWX)x XTWyTlic ability of tlie model to replicate tlic obscivcd y values is measuied by tlic goodness of fit. Tliis is conveniently expressed by
10、 tlic r2 value wliicli nuis fiom 0 to 1 and measures tlic propoition of variation in the observed y wliicli is accoiuited for (sometimes uexplained by") by variation in the model. The 广 can often be increased merely by adding variables, so the adjusted r2 is often reported 一 the adjustment take
11、s into accoiuit tlic niunber of independent variables in tlie model and reflects model parsimonyIn a regression model we may want to dctcmiinc whether tlic value of a paiainetcr is sufficiently different from zero so that the changes in the variable to wliicli it is attached will influence changes i
12、n the predictions. To determine whether valuables contiibute significantly to the model in tliis way, we divide the parameter estimate for each variable by its standard eiTor. Tlie resulting statistics have a t-distribution and may be compared witli critical values fi om a t distribution, given the
13、munber of degrees of freedom in the model#Regression with Spatial DataTlicrc ai c a niunber of assumptions underlying the basic regression model desenbed here, one of which is that tlic observations should be independent of one another. Tliis is not always the case with data for spatial miits and To
14、blcf s obscivation that “"Everything is related to cveiytliing else, but near tilings arc more related than distant tilings." (Toblcr, 1970) can be recalled Not only might the valuables in the model cxliibit spatial dependence (that is, nearby locations will have siniilai* values) but also
15、 the moders residuals uiiglit cxliibit spatial dependence Tlic latter cliaractenstic can be obscivcd if the residuals fiom tlie basic regression arc plotted ou a map where conunoiily tlie residuals in neighboring spatial luiits will have a sinnlai* magnitude and sign.Tlicsc charactcnstics of spatial
16、 data have implications for the estimates of tlic pai amctcrs in the basic model. If tlicrc is spatial stinctiue in the residuals from the model, tliis will lead to inefficient estimates of tlic parameters, wliich in turn means that tlie staudard eiTors of tlic parameters will be too large. Tliis li
17、as iniplicatious for inference wlicrc potentially significant parameter estimates may appeal' not to be so. Spatial structure in tlie data means tliat tlic value of the dependent van able in one spatial luiit is affected by the independent vaiiablcs in nearby units. Tliis leads to parameter esti
18、mates wliich arc both biased and inefficient A biased estimates is one that is either too liigh or too low as an estimate of the iu)kiiowii tnic value.Aiisclin (1988) describes model foiiiis to deal with tlicsc cases. A spatial error model is appropnate when tlicrc appears to be structure in tlie re
19、sidual temj and a spatial lag model is appropnatc when spatial structure is present in the vaiiablcs in the model. Unbiased parameter estimates can be found from both model types when maxiimun likelihood is used as tlie fitting method Spatial heterogeneity is aiiotlicr phenomenon in spatial modellin
20、g It is assumed when fitting all of tlic regression models described above that tlic relationships being modelled arc tlic same cveiywhcrc witliin tlie study aiea from which the data arc drawn. Tliis assumption is referred to as one of homogeneity However, tlicrc is often good reason to question whe
21、ther tliis assumption when dealing with spatial data as the processes generating them might vary across space This condition is refened to as spatial heterogeneity An early example of a regression model wliich attempts to deal with spatial heterogeneity is the spatial expansion method (Casetti 1972)
22、. Pai amctcrs in such models arc tlicniselves huictions of location where the user deteniiines tlie nature of the fiuiction (usually some linear polynomial). For example, if we take a simple model with two independent variables, iq and x2, with tliree pai ameters a, b、and c, that arc to be estimated
23、, i.c.:片=a + bxh + cx2lWe tlicu expand tlic paiametcrs so that they arc some linear fiuiction of tlic locations (均p) of the observations. For example, if a 1 八 order linear polyiionnal ised specified, tlien:a. = £Kq + a2vfS =+ 0M + 角叫q = 7o + W + /2V.Tlic hill expansion model itself can then be
24、 rc-wiittcn as:yt = «o + «2vr + 际 t + Awixir + 0北 x“ + /ox2l +Tliis is easily fitted using OLS、as the analyst only needs to supply the extra tcniis 吟心 v两“ apeand vpc in the model. As the locations of the observations arc also known, the spatially varying pai amctcr estimates arc easily com
25、puted and mapped A disadvantage of this model is that the analyst must detcmiinc the natiux of the parameter expansion piior to the modelling exercise It not may be iiiunediately clear what order of polynomial should be used iu advance, and specifying the wiong expansion may liide important local va
26、riation in model fomtAlternative approaches that accoiuit for spatial heterogeneity also exist, examples being spatially adaptive filtering (Foster and Gorr, 1986) in wliich parameter estimates arc allowed to '(liift' across the study arc心 and multilevel modelling (Goldstein, 1987) in wdiicl
27、i models for individual and spatially aggregate characteristics arc combined witliin the same overall model. A fourth approach extends tlic random coefficients model (Rao, 1965) to the spatial case (Swamy, 1971) where cocfficicntsvrary randomly across the study aiea Again in botli spatially adaptive
28、 filtenng and random coefficients models, the parameter estimates may be mapped to examine local variations in model fbnn. Further insights into these models can be foimd in Fothcringham (1997)3Geographically Weighted Regression (GWR)Geograpliically Weighted Regression (GWR) is a fairly recent contn
29、bution to modelling spatially heterogeneous processes (Bnuisdon et al. 1996; Fotheringham et al 1996; 1997; 2002) Tlic luidcrlying idea of GWR is that parameters may be estimated anywhere in the study area given a dependent variable and a set of one or more independent variables wliicli have been me
30、asured at places vdiosc location is hiowii. Taking Toblers observation about nearness and siniilahty into accomit we might expect tliat if we wish to estimates parameters for a model at some location u2 tlicn obscivations wliicli arc nearer tliat location should have a greater weight in the estimati
31、on than observations wliicli arc fxuthcr awayWe shall assume that the analyst has a dataset consisting of a dependent variable y and a set of m independent variablc(s) k=】叫 and that for each of the n obseivations in the dataset a measurement of its position is available in a suitable coordinate syst
32、em.Tlic equation for a typical GWR version of the OLS regression model would be:K(«) = Al (u) + Al (11)乱+爲(11)X2, +. + 0” (u)xwrThe notation /fe(u) indicates that the paiainetcr desenbes a rclatioiisliip around location u and is specific to tliat location. A prediction may be made for tlic depe
33、ndent vaiiablc if measurements for the independent vanablcs arc also available at the location u. Typically tlic locations at wliicli parameter estimates are obtained are those at wliich data arc collected, but tliis need not necessanly be the ease. This would seem to be an lunisual claim, but it wi
34、ll be come clear wlicn we consider the nature of the geograpliical weighting.Tlic estimator for tliis model is siniilai* to tlic WLS (weighted least squaies) global model above except tliat tlic weights arc conditioned on the location u relative to tlic otlicr observations in tlie dataset and hence
35、change for each location. Tlic estimator takes the fonn:d(u) = (XTW(u)X)XTlV(u)y:We shall use u to indicate some general location in the study area Typically u will be a vector of coordinates measured in either a projected coordinate system (such as Universal Transverse Mercator) or a geodetic syste
36、m such as WGS84. A particular location can be indexed ub with Cartesian coordinates (uv:) or geodetic coordinates (九协W(u) is a squai c matrix of weights relative to tlic position of u in tlic study ai ca; XTW(u)X is the geograpliically weighted vanancc-covariance niatiix (tlic estimation requires it
37、s inverse to be obtained), andy is the vector of the values of the dependent variable.The W(u) matrix contains tlic geograpliical weights 111 its leading diagonal and 0 in its off- diagonal elements.w2(u)wQ)The weights themselves aix computed from a weighting scheme tliat is also known as a kernel.
38、A inunbcr of kernels ai c possible: a typical one has a Gaussian shape:where w,(u) is the geograpliical weight of the ith obscivation in the dataset relative to the location u, d,(u) is some measure of tlic distance between tlic zth observation and the location u, and h is a quantity known as tlic b
39、aiidwidtli. The distances arc generally Euclidean distances when Caitcsiaii coordinates aic used and Great Circle distances when sphene al coordinates arc used However、there is no reason wliy non-Euclidcan distances might be used (for example、distances along a road network).The bandwidth in the kern
40、el is expressed in tlic same iniits as tlie coordinates used in the dataset As tlic baiidwidtli gets larger tlic weights approach miity and the local GWR model approaches tlic global OLS model.As we have already stated、the locations at wliich parameters arc estimated arc generally the locations at w
41、liich the observations in the dataset have been collected Tliis allows predictions to be made for the dependent van able and residuals to be computed. Tlicsc are ncccssaiy in detcniiiiiing the goodness of fit of the model and we shall discuss tliis below. The locations at wliich parameters arc estim
42、ated can also be non-sample points in the study aica 一 perhaps tlic mesh points of a regular gnd, or tlie locations of obscivations in a validation dataset wliich has tlie same dependent and independent vanables as the calibration datasetIt is convenient to refer to the locations at wliich the calib
43、ration data have been collected as the sample points and the locations at wliich parameters arc estimated as the regression#points Tlie combination of geograpliically weighted estimator kernel and baiidwidtli can be rcfciTcd to as a local modelKcmcls other tlian the Gaussian can be used in GWR altho
44、ugh in practice it generally matters very little as long as the kernel is cGaiissian-likc In tenns of influencing tlie fit of tlie model, the choice of a baiidwidtli is more inipoHant tlian the shape of tlie kemel. If the sample points arc reasonably regularly spaced in tlie study area, tlicn a keme
45、l witli fixed baiidwidtli is a suitable choice for modelling If the sample points arc not regularly spaced but arc clustered in the study area it is gcneially desirable to allow tlie kernel to acconunodate tliis iiTcgiilarity by increasing its size wlicn the sample points are sparser and decreasing
46、its size when the sample points arc denser. A convenient way of implementing tliis adaptive bandwidth specification is to choose a kemel wliich allows the same niunber of sample points for each estimation. Tliis is usually accomplished by sorting the distances of the sample points from the desired r
47、egression poiut u and setting tlie baiidwidtli so that it includes only the first p observations, where tlie optimal value of p is Found fiom tlie data. Tlie weight can be computed by using the specified kernel and setting the value for any obscivation whose distance is greater tlian tlie bandwidth
48、to zero, thereby excluding them from tlie local calibration. One such kernel is the bisqiiare:where w(u) is zero when d(u) > h Tliis is a ncar-Gaussiaii fiuiction witli tlie uscfiil property that the weight is zero at a finite distance, hi the ArcGIS implementation a fixed radius kemel isand the
49、adaptive kernel is based on the bi square.Wlicn the sample and regression points coincide residuals and predictions of the dependent variables arc available. Tlicse values can be used to measure the goodness of fit of the model. For tlie conventional global model, tlie usual goodness of fit measure
50、is the r2 or adjusted r2 value. The adjusted value is preferable if several models arc compared as it compensates for the munber of vaiiables or parameters in the model. Ill general a model with more variables or parameters is likely to have a liiglier r2 than one witli fewer Tlie situation is a lit
51、tle more complex with GWR and we need to consider the effective munber ofpar am eters in the model when computing a goodnessof-fit measure.An interesting matrix in regression modelling is known as tlie hat mahix, S. Wlicn tlie observedy values are prenndtiplicd by S we obtain the predicted (fitted)
52、values thus:y = sv7Tlic trace of the matrix S ill a global model yields the number of pai aincters in tliat model the trace is tlie siun of the values in the leading diagonal of tliis matiix, usually expressed as tr(S). Tlic trace of S for ail OLS regression is the munber of parameters in the model,
53、 hi a GWR model the effective munber of parameter is obtained from the expression 2tr(S) ti(STS) The effective munber of parameters in the model depends on the munber of independent vaiiablcs and tlic bandwidth and can often be large and is usually not an integer. However, it is usefill in evaluatin
54、g the fit of the modeln + tr(S)+ n 一 2-力(S)The measure of goodness of fit wliich wc use extensively in GWR is the corrected Akaike Iiifonnatioii Criterion (Hiirvich et al. 1998). This takes the following fonn:AICc = 2nlog4r(<r) + nlog<(2)where n is tlic munber of observations in the dataset, d
55、 is tlic estimate of the standaid deviation of the residuals, and tr(S) is the trace of the hat matiix. Tlic AICc can be used to compare models of the same y variable wliich have vciy different right hand sides and it contains a penalty for the complexity (degrees of freedom) of the model.Tlie AICc
56、provides a measure of tlie information distance between the model wliich has actually been fitted and tlic unknown 'true、model Tliis distance is not an absolute measure but a relative mcasiue known as tlic KiillbackLtiblcr infonnation distance. Two separate models wliich arc being conipai ed arc
57、 held to be equivalent if the difference between the two AICc values is less than 3. Tliis is a widely accepted nilc of thumb、however the more cautious analyst ought use 4 instead As the AICc is a relative measure, the actual values wliich arc reported in the GWR output might be counter-intuitively
58、large or small. This does not matter, as it is tlic difTei cnees in the AICc values wliich arc impoitant The AICc fbmnila contains log tcmis、and witli a little manipulation it can be shown tliat the difference between AICc values for two models with identical degrees of ficcdom coiresponds to tlic r
59、atio of tlic likelihoods of tlic models, altliough it should be stressed tliat the AICc is not a likelihood ratio test.The AICc can not only be used to conipaix models with different independent variable subsets, but can also be used the compare the global OLS model with a local GWR model. The AICc is also used in the softwaie tlie detciiiiiiie tlie 'optim
温馨提示
- 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
- 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
- 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
- 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
- 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
- 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
- 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。
最新文档
- 实验中学教学工作总结6篇
- 给学校的感谢信模板汇编八篇
- 2021年年末教师个人述职报告
- 事业单位员工个人工作总结范文大全
- 出纳职业发展规划
- 销售个人工作总结2022汇报5篇
- 争吵生字课件
- 寒假十课第八课的观后感
- 我和书的故事小学作文(集锦15篇)
- 小学班主任年度考核工作总结
- 浙江大学医学院附属儿童医院招聘人员真题
- 2024年江苏省苏州市中考数学试卷含答案
- 软件测试汇报
- 吉林省长春市第一〇八学校2024-2025学年七年级上学期期中历史试题
- 2024年世界职业院校技能大赛高职组“市政管线(道)数字化施工组”赛项考试题库
- 初中《孙中山诞辰纪念日》主题班会
- 5.5 跨学科实践:制作望远镜教学设计八年级物理上册(人教版2024)
- 阿斯伯格综合症自测题汇博教育员工自测题含答案
- 天津市2023-2024学年七年级上学期语文期末试卷(含答案)
- 2025年蛇年年度营销日历营销建议【2025营销日历】
- 2024年法律职业资格考试(试卷一)客观题试卷及解答参考
评论
0/150
提交评论