




版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领
文档简介
ClusteringClusteringOverviewPartitioningMethodsK-MeansSequentialLeaderModelBasedMethodsDensityBasedMethodsHierarchicalMethods2OverviewPartitioningMethods2Whatisclusteranalysis?FindinggroupsofobjectsObjectssimilartoeachotherareinthesamegroup.Objectsaredifferentfromthoseinothergroups.UnsupervisedLearningNolabelsDatadriven3Whatisclusteranalysis?FindiClustersInter-ClusterIntra-Cluster4ClustersInter-ClusterIntra-CluClusters5Clusters5ApplicationsofClusteringMarketingFindinggroupsofcustomerswithsimilarbehaviours.BiologyFindinggroupsofanimalsorplantswithsimilarfeatures.BioinformaticsClusteringmicroarraydata,genesandsequences.EarthquakeStudiesClusteringobservedearthquakeepicenterstoidentifydangerouszones.WWWClusteringweblogdatatodiscovergroupsofsimilaraccesspatterns.SocialNetworksDiscoveringgroupsofindividualswithclosefriendshipsinternally.6ApplicationsofClusteringMarkEarthquakes7Earthquakes7ImageSegmentation8ImageSegmentation8TheBigPicture9TheBigPicture9RequirementsScalabilityAbilitytodealwithdifferenttypesofattributesAbilitytodiscoverclusterswitharbitraryshapeMinimumrequirementsfordomainknowledgeAbilitytodealwithnoiseandoutliersInsensitivitytoorderofinputrecordsIncorporationofuser-definedconstraintsInterpretabilityandusability10RequirementsScalability10PracticalConsiderationsScalingmatters!11PracticalConsiderationsScalinNormalizationorNot?12NormalizationorNot?121313EvaluationVS.14EvaluationVS.14Evaluation15Evaluation15SilhouetteAmethodofinterpretationandvalidationofclustersofdata.Asuccinctgraphicalrepresentationofhowwelleachdatapointlieswithinitsclustercomparedtootherclusters.a(i):averagedissimilarityofiwithallotherpointsinthesameclusterb(i):thelowestaveragedissimilarityofitootherclusters16SilhouetteAmethodofinterpreSilhouette17Silhouette17K-Means18K-Means18K-Means19K-Means19K-Means20K-Means20K-MeansDeterminethevalueofK.ChooseKclustercentresrandomly.Eachdatapointisassignedtoitsclosestcentroid.Usethemeanofeachclustertoupdateeachcentroid.Repeatuntilnomorenewassignment.ReturntheKcentroids.ReferenceJ.MacQueen(1967):"SomeMethodsforClassificationandAnalysisofMultivariateObservations",Proceedingsofthe5thBerkeleySymposiumonMathematicalStatisticsandProbability,vol.1,pp.281-297.21K-MeansDeterminethevalueofCommentsonK-MeansProsSimpleandworkswellforregulardisjointclusters.Convergesrelativelyfast.RelativelyefficientandscalableO(t·k·n)t:iteration;k:numberofcentroids;n:numberofdatapointsConsNeedtospecifythevalueofKinadvance.Difficultanddomainknowledgemayhelp.Mayconvergetolocaloptima.Inpractice,trydifferentinitialcentroids.Maybesensitivetonoisydataandoutliers.Meanofdatapoints…NotsuitableforclustersofNon-convexshapes22CommentsonK-MeansPros22TheInfluenceofInitialCentroids23TheInfluenceofInitialCentrTheInfluenceofInitialCentroids24TheInfluenceofInitialCentrSequentialLeaderClusteringAveryefficientclusteringalgorithm.NoiterationAsinglepassofthedataNoneedtospecifyKinadvance.Chooseaclusterthresholdvalue.Foreverynewdatapoint:Computethedistancebetweenthenewdatapointandeverycluster'scentre.Iftheminimumdistanceissmallerthanthechosenthreshold,assignthenewdatapointtothecorrespondingclusterandre-computeclustercentre.Otherwise,createanewclusterwiththenewdatapointasitscentre.Clusteringresultsmaybeinfluencedbythesequenceofdatapoints.25SequentialLeaderClusteringA2626GaussianMixture27GaussianMixture27ClusteringbyMixtureModels28ClusteringbyMixtureModels28K-MeansRevisited
modelparameterslatentparameters29K-MeansRevisited
modelparamExpectationMaximization30ExpectationMaximization30
31
31EM:GaussianMixture32EM:GaussianMixture323333DensityBasedMethodsGenerateclustersofarbitraryshapes.Robustagainstnoise.NoKvaluerequiredinadvance.Somewhatsimilartohumanvision.34DensityBasedMethodsGenerateDBSCANDensity-BasedSpatialClusteringofApplicationswithNoiseDensity:numberofpointswithinaspecifiedradiusCorePoint:pointswithhighdensityBorderPoint:pointswithlowdensitybutintheneighbourhoodofacorepointNoisePoint:neitheracorepointnoraborderpointCorePointNoisePointBorderPoint35DBSCANDensity-BasedSpatialClDBSCANpqdirectlydensityreachablepqdensityreachableoqpdensityconnected36DBSCANpqdirectlydensityreachDBSCANAclusterisdefinedasthemaximalsetofdensityconnectedpoints.StartfromarandomlyselectedunseenpointP.IfPisacorepoint,buildaclusterbygraduallyaddingallpointsthataredensityreachabletothecurrentpointset.Noisepointsarediscarded(unlabelled).37DBSCANAclusterisdefinedasHierarchicalClusteringProduceasetofnestedtree-likeclusters.Canbevisualizedasadendrogram.Clusteringisobtainedbycuttingatdesiredlevel.NoneedtospecifyKinadvance.Maycorrespondtomeaningfultaxonomies.38HierarchicalClusteringProduceAgglomerativeMethodsBottom-upMethodAssigneachdatapointtoacluster.Calculatetheproximitymatrix.Mergethepairofclosestclusters.Repeatuntilonlyasingleclusterremains.Howtocalculatethedistancebetweenclusters?SingleLinkMinimumdistancebetweenpointsCompleteLinkMaximumdistancebetweenpoints39AgglomerativeMethodsBottom-upExample
BAFIMINARMTOBA0662877255412996FI6620295468268400MI8772950754564138NA2554687540219869RM4122685642190669TO9964001388696690SingleLink40Example
BAFIMINARMTOBA06628772Example
BAFIMI/TONARMBA0662877255412FI6620295468268MI/TO8772950754564NA2554687540219RM412268564219
温馨提示
- 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
- 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
- 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
- 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
- 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
- 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
- 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。
最新文档
- 借名买房协议律师版3篇
- 农作物购销协议3篇
- 国外学历认证合同3篇
- 废油处理资源化服务协议3篇
- 国际检验中心砌墙协议3篇
- 厂家质量保修卡模板3篇
- 廊架施工合同方案的制定流程2篇
- 建议书打造绿色奥运3篇
- 刻章委托协议3篇
- 畜牧良种繁殖的生态环境保护考核试卷
- 2025商业综合体委托经营管理合同书
- 2024-2025学年北师大版生物七年级下册期中模拟生物试卷(含答案)
- 林业理论考试试题及答案
- 超市店长价格管理制度
- 2025-2030中国脑芯片模型行业市场发展趋势与前景展望战略研究报告
- 2025年河南省洛阳市洛宁县中考一模道德与法治试题(含答案)
- 掘进爆破、爆破安全知识
- 绿色工厂员工培训
- GB/T 17622-2008带电作业用绝缘手套
- 煤矿班组安全文化建设(课堂PPT)
- ISO15189体系性能验证报告模版-EP15
评论
0/150
提交评论