




已阅读5页,还剩32页未读, 继续免费阅读
版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领
文档简介
CS345DataMining,LinkAnalysisAlgorithmsPageRank,AnandRajaraman,JeffreyD.Ullman,LinkAnalysisAlgorithms,PageRankHubsandAuthoritiesTopic-SpecificPageRankSpamDetectionAlgorithmsOtherinterestingtopicswewontcoverDetectingduplicatesandmirrorsMiningforcommunities,Rankingwebpages,Webpagesarenotequally“important”Ihas23,400inlinkswww.joe-has1inlinkAreallinlinksequal?Recursivequestion!,Simplerecursiveformulation,EachlinksvoteisproportionaltotheimportanceofitssourcepageIfpagePwithimportancexhasnoutlinks,eachlinkgetsx/nvotesPagePsownimportanceisthesumofthevotesonitsinlinks,Simple“flow”model,Thewebin1839,y,a,m,y/2,y/2,a/2,a/2,m,y=y/2+a/2a=y/2+mm=a/2,Solvingtheflowequations,3equations,3unknowns,noconstantsNouniquesolutionAllsolutionsequivalentmoduloscalefactorAdditionalconstraintforcesuniquenessy+a+m=1y=2/5,a=2/5,m=1/5Gaussianeliminationmethodworksforsmallexamples,butweneedabettermethodforlargegraphs,Matrixformulation,MatrixMhasonerowandonecolumnforeachwebpageSupposepagejhasnoutlinksIfj!i,thenMij=1/nElseMij=0MisacolumnstochasticmatrixColumnssumto1SupposerisavectorwithoneentryperwebpageriistheimportancescoreofpageiCallittherankvector|r|=1,Example,Supposepagejlinksto3pages,includingi,r,Eigenvectorformulation,Theflowequationscanbewrittenr=MrSotherankvectorisaneigenvectorofthestochasticwebmatrixInfact,itsfirstorprincipaleigenvector,withcorrespondingeigenvalue1,Example,y1/21/20a1/201m01/20,yam,y=y/2+a/2a=y/2+mm=a/2,PowerIterationmethod,Simpleiterativescheme(akarelaxation)SupposethereareNwebpagesInitialize:r0=1/N,.,1/NTIterate:rk+1=MrkStopwhen|rk+1-rk|1|x|1=1iN|xi|istheL1normCanuseanyothervectornorme.g.,Euclidean,PowerIterationExample,y1/21/20a1/201m01/20,yam,ya=m,1/31/31/3,1/31/21/6,5/121/31/4,3/811/241/6,2/52/51/5,.,RandomWalkInterpretation,ImaginearandomwebsurferAtanytimet,surferisonsomepagePAttimet+1,thesurferfollowsanoutlinkfromPuniformlyatrandomEndsuponsomepageQlinkedfromPProcessrepeatsindefinitelyLetp(t)beavectorwhoseithcomponentistheprobabilitythatthesurferisatpageiattimetp(t)isaprobabilitydistributiononpages,Thestationarydistribution,Whereisthesurferattimet+1?Followsalinkuniformlyatrandomp(t+1)=Mp(t)Supposetherandomwalkreachesastatesuchthatp(t+1)=Mp(t)=p(t)Thenp(t)iscalledastationarydistributionfortherandomwalkOurrankvectorrsatisfiesr=MrSoitisastationarydistributionfortherandomsurfer,ExistenceandUniqueness,Acentralresultfromthetheoryofrandomwalks(akaMarkovprocesses):Forgraphsthatsatisfycertainconditions,thestationarydistributionisuniqueandeventuallywillbereachednomatterwhattheinitialprobabilitydistributionattimet=0.,Spidertraps,AgroupofpagesisaspidertrapiftherearenolinksfromwithinthegrouptooutsidethegroupRandomsurfergetstrappedSpidertrapsviolatetheconditionsneededfortherandomwalktheorem,Microsoftbecomesaspidertrap,Yahoo,Msoft,Amazon,y1/21/20a1/200m01/21,yam,ya=m,111,11/23/2,3/41/27/4,5/83/82,003,.,Randomteleports,TheGooglesolutionforspidertrapsAteachtimestep,therandomsurferhastwooptions:Withprobability,followalinkatrandomWithprobability1-,jumptosomepageuniformlyatrandomCommonvaluesforareintherange0.8to0.9Surferwillteleportoutofspidertrapwithinafewtimesteps,Randomteleports(=0.8),Yahoo,Msoft,Amazon,1/2,1/2,0.8*1/2,0.8*1/2,0.2*1/3,0.2*1/3,0.2*1/3,1/21/201/20001/21,1/31/31/31/31/31/31/31/31/3,y7/157/151/15a7/151/151/15m1/157/1513/15,0.8,+0.2,Randomteleports(=0.8),Yahoo,Msoft,Amazon,1/21/201/20001/21,1/31/31/31/31/31/31/31/31/3,y7/157/151/15a7/151/151/15m1/157/1513/15,0.8,+0.2,ya=m,111,1.000.601.40,0.840.601.56,0.7760.5361.688,7/115/1121/11,.,Matrixformulation,SupposethereareNpagesConsiderapagej,withsetofoutlinksO(j)WehaveMij=1/|O(j)|whenj!iandMij=0otherwiseTherandomteleportisequivalenttoaddingateleportlinkfromjtoeveryotherpagewithprobability(1-)/Nreducingtheprobabilityoffollowingeachoutlinkfrom1/|O(j)|to/|O(j)|Equivalent:taxeachpageafraction(1-)ofitsscoreandredistributeevenly,PageRank,ConstructtheNNmatrixAasfollowsAij=Mij+(1-)/NVerifythatAisastochasticmatrixThepagerankvectorristheprincipaleigenvectorofthismatrixsatisfyingr=ArEquivalently,risthestationarydistributionoftherandomwalkwithteleports,Deadends,Pageswithnooutlinksare“deadends”fortherandomsurferNowheretogoonnextstep,Microsoftbecomesadeadend,Yahoo,Msoft,Amazon,ya=m,111,10.60.6,0.7870.5470.387,0.6480.4300.333,000,.,1/21/201/20001/20,1/31/31/31/31/31/31/31/31/3,y7/157/151/15a7/151/151/15m1/157/151/15,0.8,+0.2,Dealingwithdead-ends,TeleportFollowrandomteleportlinkswithprobability1.0fromdead-endsAdjustmatrixaccordinglyPruneandpropagatePreprocessthegraphtoeliminatedead-endsMightrequiremultiplepassesComputepagerankonreducedgraphApproximatevaluesfordeadendsbypropagatingvaluesfromreducedgraph,Computingpagerank,Keystepismatrix-vectormultiplicationrnew=AroldEasyifwehaveenoughmainmemorytoholdA,rold,rnewSayN=1billionpagesWeneed4bytesforeachentry(say)2billionentriesforvectors,approx8GBMatrixAhasN2entries1018isalargenumber!,Rearrangingtheequation,r=Ar,whereAij=Mij+(1-)/Nri=1jNAijrjri=1jNMij+(1-)/Nrj=1jNMijrj+(1-)/N1jNrj=1jNMijrj+(1-)/N,since|r|=1r=Mr+(1-)/NNwherexNisanN-vectorwithallentriesx,Sparsematrixformulation,Wecanrearrangethepagerankequation:r=Mr+(1-)/NN(1-)/NNisanN-vectorwithallentries(1-)/NMisasparsematrix!10linkspernode,approx10NentriesSoineachiteration,weneedto:Computernew=MroldAddaconstantvalue(1-)/Ntoeachentryinrnew,Sparsematrixencoding,EncodesparsematrixusingonlynonzeroentriesSpaceproportionalroughlytonumberoflinkssay10N,or4*10*1billion=40GBstillwontfitinmemory,butwillfitondisk,sourcenode,degree,destinationnodes,BasicAlgorithm,AssumewehaveenoughRAMtofitrnew,plussomeworkingmemoryStoreroldandmatrixMondiskBasicAlgorithm:Initialize:rold=1/NNIterate:Update:PerformasequentialscanofMandroldtoupdaternewWriteoutrnewtodiskasroldfornextiterationEveryfewiterations,compute|rnew-rold|andstopifitisbelowthresholdNeedtoreadinbothvectorsintomemory,Updatestep,src,degree,destination,0,1,2,3,4,5,6,0,1,2,3,4,5,6,rnew,rold,Initializeallentriesofrnewto(1-)/NForeachpagep(out-degreen):Readintomemory:p,n,dest1,destn,rold(p)forj=1.n:rnew(destj)+=*rold(p)/n,Analysis,Ineachiteration,wehaveto:ReadroldandMWriternewbacktodiskIOCost=2|r|+|M|Whatifwehadenoughmemorytofitbothrnewandrold?Whatifwecouldnotevenfitrnewinmemory
温馨提示
- 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
- 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
- 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
- 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
- 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
- 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
- 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。
最新文档
- 九年级历史下册 第一单元 殖民地人民的反抗与资本主义制度的扩展 第2课 俄国的改革教学设计设计(pdf) 新人教版
- Unit 2 My schoolbag Part B Let's talk Let's learn大单元整体教学设计表格式-2024-2025学年人教PEP版英语四年级上册
- 《习作:我有一个想法》教学设计-2024-2025学年统编版语文三年级上册
- 6 感官总动员 教学设计-2024-2025学年科学一年级上册苏教版
- 2024年九年级化学上册 根据化学式进行计算专题教学设计 (新版)沪教版
- 2024秋九年级语文上册 第三单元 12湖心亭看雪教学设计 新人教版
- 七年级英语下册 Module 8 Story time Unit 1 Once upon a time….第2课时教学设计(新版)外研版
- 蓝光治疗护理及注意事项
- 27 故事二则《扁鹊治病》(教学设计 )-2024-2025学年统编版语文四年级上册
- 2023六年级数学上册 三 分数除法 分数除法应用题教学设计 苏教版
- GB/T 24915-2010合同能源管理技术通则
- 凸透镜成像规律动画可拖动最佳版swf
- 6层框架住宅毕业设计结构计算书
- 2016众泰t600运动版原厂维修手册与电路图-使用说明
- Sigma-Delta-ADC讲稿教学讲解课件
- 《春秋三传导读》课件
- 麻醉科进修汇报课件
- ISO-IEC 27002-2022中文版完整详细
- 【计算机应用基础试题】上海中侨职业技术大学2022年练习题汇总(附答案解析)
- 艾滋病(AIDS)经典课件
- 义务教育质量监测小学四年级 德育模拟试卷附答案
评论
0/150
提交评论