




版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领
文档简介
ClusteringClusteringOverviewPartitioningMethodsK-MeansSequentialLeaderModelBasedMethodsDensityBasedMethodsHierarchicalMethods2OverviewPartitioningMethods2Whatisclusteranalysis?FindinggroupsofobjectsObjectssimilartoeachotherareinthesamegroup.Objectsaredifferentfromthoseinothergroups.UnsupervisedLearningNolabelsDatadriven3Whatisclusteranalysis?FindiClustersInter-ClusterIntra-Cluster4ClustersInter-ClusterIntra-CluClusters5Clusters5ApplicationsofClusteringMarketingFindinggroupsofcustomerswithsimilarbehaviours.BiologyFindinggroupsofanimalsorplantswithsimilarfeatures.BioinformaticsClusteringmicroarraydata,genesandsequences.EarthquakeStudiesClusteringobservedearthquakeepicenterstoidentifydangerouszones.WWWClusteringweblogdatatodiscovergroupsofsimilaraccesspatterns.SocialNetworksDiscoveringgroupsofindividualswithclosefriendshipsinternally.6ApplicationsofClusteringMarkEarthquakes7Earthquakes7ImageSegmentation8ImageSegmentation8TheBigPicture9TheBigPicture9RequirementsScalabilityAbilitytodealwithdifferenttypesofattributesAbilitytodiscoverclusterswitharbitraryshapeMinimumrequirementsfordomainknowledgeAbilitytodealwithnoiseandoutliersInsensitivitytoorderofinputrecordsIncorporationofuser-definedconstraintsInterpretabilityandusability10RequirementsScalability10PracticalConsiderationsScalingmatters!11PracticalConsiderationsScalinNormalizationorNot?12NormalizationorNot?121313EvaluationVS.14EvaluationVS.14Evaluation15Evaluation15SilhouetteAmethodofinterpretationandvalidationofclustersofdata.Asuccinctgraphicalrepresentationofhowwelleachdatapointlieswithinitsclustercomparedtootherclusters.a(i):averagedissimilarityofiwithallotherpointsinthesameclusterb(i):thelowestaveragedissimilarityofitootherclusters16SilhouetteAmethodofinterpreSilhouette17Silhouette17K-Means18K-Means18K-Means19K-Means19K-Means20K-Means20K-MeansDeterminethevalueofK.ChooseKclustercentresrandomly.Eachdatapointisassignedtoitsclosestcentroid.Usethemeanofeachclustertoupdateeachcentroid.Repeatuntilnomorenewassignment.ReturntheKcentroids.ReferenceJ.MacQueen(1967):"SomeMethodsforClassificationandAnalysisofMultivariateObservations",Proceedingsofthe5thBerkeleySymposiumonMathematicalStatisticsandProbability,vol.1,pp.281-297.21K-MeansDeterminethevalueofCommentsonK-MeansProsSimpleandworkswellforregulardisjointclusters.Convergesrelativelyfast.RelativelyefficientandscalableO(t·k·n)t:iteration;k:numberofcentroids;n:numberofdatapointsConsNeedtospecifythevalueofKinadvance.Difficultanddomainknowledgemayhelp.Mayconvergetolocaloptima.Inpractice,trydifferentinitialcentroids.Maybesensitivetonoisydataandoutliers.Meanofdatapoints…NotsuitableforclustersofNon-convexshapes22CommentsonK-MeansPros22TheInfluenceofInitialCentroids23TheInfluenceofInitialCentrTheInfluenceofInitialCentroids24TheInfluenceofInitialCentrSequentialLeaderClusteringAveryefficientclusteringalgorithm.NoiterationAsinglepassofthedataNoneedtospecifyKinadvance.Chooseaclusterthresholdvalue.Foreverynewdatapoint:Computethedistancebetweenthenewdatapointandeverycluster'scentre.Iftheminimumdistanceissmallerthanthechosenthreshold,assignthenewdatapointtothecorrespondingclusterandre-computeclustercentre.Otherwise,createanewclusterwiththenewdatapointasitscentre.Clusteringresultsmaybeinfluencedbythesequenceofdatapoints.25SequentialLeaderClusteringA2626GaussianMixture27GaussianMixture27ClusteringbyMixtureModels28ClusteringbyMixtureModels28K-MeansRevisited
modelparameterslatentparameters29K-MeansRevisited
modelparamExpectationMaximization30ExpectationMaximization30
31
31EM:GaussianMixture32EM:GaussianMixture323333DensityBasedMethodsGenerateclustersofarbitraryshapes.Robustagainstnoise.NoKvaluerequiredinadvance.Somewhatsimilartohumanvision.34DensityBasedMethodsGenerateDBSCANDensity-BasedSpatialClusteringofApplicationswithNoiseDensity:numberofpointswithinaspecifiedradiusCorePoint:pointswithhighdensityBorderPoint:pointswithlowdensitybutintheneighbourhoodofacorepointNoisePoint:neitheracorepointnoraborderpointCorePointNoisePointBorderPoint35DBSCANDensity-BasedSpatialClDBSCANpqdirectlydensityreachablepqdensityreachableoqpdensityconnected36DBSCANpqdirectlydensityreachDBSCANAclusterisdefinedasthemaximalsetofdensityconnectedpoints.StartfromarandomlyselectedunseenpointP.IfPisacorepoint,buildaclusterbygraduallyaddingallpointsthataredensityreachabletothecurrentpointset.Noisepointsarediscarded(unlabelled).37DBSCANAclusterisdefinedasHierarchicalClusteringProduceasetofnestedtree-likeclusters.Canbevisualizedasadendrogram.Clusteringisobtainedbycuttingatdesiredlevel.NoneedtospecifyKinadvance.Maycorrespondtomeaningfultaxonomies.38HierarchicalClusteringProduceAgglomerativeMethodsBottom-upMethodAssigneachdatapointtoacluster.Calculatetheproximitymatrix.Mergethepairofclosestclusters.Repeatuntilonlyasingleclusterremains.Howtocalculatethedistancebetweenclusters?SingleLinkMinimumdistancebetweenpointsCompleteLinkMaximumdistancebetweenpoints39AgglomerativeMethodsBottom-upExample
BAFIMINARMTOBA0662877255412996FI6620295468268400MI8772950754564138NA2554687540219869RM4122685642190669TO9964001388696690SingleLink40Example
BAFIMINARMTOBA06628772Example
BAFIMI/TONARMBA0662877255412FI6620295468268MI/TO8772950754564NA2554687540219RM412268564219
温馨提示
- 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
- 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
- 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
- 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
- 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
- 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
- 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。
最新文档
- 城市综合体车库租赁管理合同范本
- 环保型物流仓储配送一体化合同样本
- 珠宝首饰寄售合作协议范本
- 车辆购置附加金融贷款及保险合同
- 虚拟现实剧本创作及授权许可合同
- 高档车库物业管理及维修保养服务合同
- 非生产采购培训
- 餐饮店股权转让与数字化运营服务协议
- 餐饮外卖服务与消费者权益保护协议
- 武术课件图片大全集
- 基于PLC的电梯控制系统设计
- 学生不爱写作业分析报告
- 楼宇电气系统安全检查表
- 钢支撑(钢管)强度及稳定性验算
- JGJ 355-2015(2023年版) 钢筋套筒灌浆连接应用技术规程
- 口暴服务流程
- 带式输送机-毕业设计
- 视器说课课件
- GB/T 43232-2023紧固件轴向应力超声测量方法
- 产房医院感染控制风险评估表
- 武汉杨春湖实验学校小学六年级小升初期末语文试卷
评论
0/150
提交评论