




下载本文档
版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领
文档简介
1、1,Discovering Evolutionary Theme Patterns from Text - An Exploration of Temporal Text Mining,Qiaozhu Mei, ChengXiang Zhai Department of Computer Science University of Illinois at Urbana-Champaign U.S.A,2,Motivation,Most text collections bear time stamps News articles, scientific literature, emails,
2、etc. Many useful temporal patterns exist Emerging topics/themes Decaying topics/themes Topic evolution thread Topic/theme life cycles How do we discover and exploit such patterns?,3,Theme Evolution Graph (Asia Tsunami),Immediate Reports,Statistics of Death and loss,Personal Experience of Survivors,S
3、tatistics of further impact,Aid from Local Areas,Aid from the world,Donations from countries,Specific Events of Aid,Lessons from Tsunami,Research inspired,Time,Doc1 Doc3 Doc .,Theme spans,Evolutionary transitions,Theme evolution thread,Useful for summarizing the news,4,Theme Life Cycle (SIGIR Procee
4、dings),Useful for revealing historical trends and hot topics,Theme Strength,Time,1980,1990,1998,2003,TF-IDF Retrieval,IR Applications,Language Model,Text Categorization,5,Problem Definition,Evolutionary Theme Pattern (ETP) Theme Evolution Graph A weighted directed graph in which each vertex is a the
5、me span and each edge is an evolutionary transition Theme Life Cycle The strength of a theme over the whole time line Given a text collection with time stamps, the problem of discovering ETP is to Extract a theme evolution graph Model the life cycles of the most salient themes,6,Research Questions,H
6、ow to represent a theme? How to extract themes from a collection automatically? How to model the transitions of themes? How to segment the collection with themes? How to model and compute the strength of each theme at a given time period?,7,Our Approach,t,11,12,13,21,22,31,3k,Theme Evolution Graph,s
7、,t,Theme Life cycles,Collection with time stamps,t1 t2 t3, , t,8,Our Approach (Cont.),Extracting Theme Evolution Graph Partition collection into time intervals Extract themes from each time span (task I) Model transitions between theme spans (task II) Modeling theme life cycles Extract most salient
8、themes from the whole collection (task I) Segment the collection with themes (task III ) Compute the strength of each theme over time,9,Task I: Theme Extraction,There are k themes in the collection (or a time span), each document is a sample of words generated by multiple themes Infer the best theme
9、 language models that fit our data,Theme 1,Theme k,Theme 2,Background B,warning 0.3 system 0.2.,Aid 0.1donation 0.05support 0.02 .,statistics 0.2loss 0.1dead 0.05 .,Is 0.05the 0.04a 0.03 .,B,W,d,1,d, k,1 - B,d,2,“Generating” word w in doc d in the collection,Parameters: B=noise-level (manually set)
10、s and s are estimated with Maximum Likelihood,10,Task II: Transition Modeling,Theme spans in an earlier time interval could evolve into theme spans in a later time interval,T,t1,t2,A,C,?,B,?,microarray 0.2gene 0.1protein 0.05,web 0.3classification 0.1topic 0.1,Information 0.2topic 0.1 classification
11、 0.1text 0.05,Similarity/distance between two theme spans is modeled with KL Divergence between two distributions,11,Task III: Theme Segmentation,View the whole collection as a sequence ordered by time, Model the theme shifts in documents with a Hidden Markov Model,Theme 1,Theme 3,Theme 2,Background
12、,The Collection,w,w,w,w,w,w,w,w,w,w,w,w,w,w,w,w,w,w,w,12,Our Approach: Revisit,t,11,12,13,21,22,31,3k,Theme Evolution Graph,s,t,Theme Life cycles,Collection with time stamps,t1 t2 t3, , t,13,Experiments,Two data sets: Asia Tsunami: 7468 news articles spanning 50 days from 10 news sources KDD Abstrac
13、ts: 496 abstracts from 6 years KDD conference proceedings On each data set, we extract a theme evolution graph and model the life cycles of global salient themes,14,Theme Evolution Graph: Tsunami,T,aid 0.020 relief 0.016U.S. 0.013military 0.011U.N. 0.011 ,Bush 0.016U.S. 0.015$ 0.009 relief 0.008 mil
14、lion 0.008 ,Indonesian 0.01 military 0.01islands 0.008 foreign 0.008aid 0.007 ,system 0.0104 Bush 0.008 warning 0.007conference 0.005US 0.005 ,system 0.008 China 0.007 warning 0.005Chinese 0.005 ,warning 0.012system 0.012 Islands 0.009 Japan 0.005quake 0.003 , , , ,12/28/04,01/05/05,01/15/05,15,Them
15、e Life Cycles: Tsunami,Aid from the world,Research,Aid for children,statistics,Personal experiences,$ 0.0173million 0.0135relief 0.0134aid 0.0099U.N. 0.0066 ,I 0.0322wave 0.0061beach 0.0051saw 0.0046sea 0.0046 ,CNN, Absolute Strength,16,Theme Life Cycles: Tsunami,Aid from the world,Research,Aid from
16、 China,statistics,Scene and Experiences,dollars 0.0226million 0.0204aid 0.0118U.N. 0.0102 reconstruction0.0062 ,China 0.0391yuan 0.0180 Beijing 0.0089 $ 0.0058donation 0.0052 ,XINHUA News, Absolute Strength,17,Theme Life Cycles: Tsunami,Aid from the world,Research,Aid from China,statistics,Scene and
17、 Experiences,$ 0.0173million 0.0135relief 0.0134aid 0.0099U.N. 0.0066 ,China 0.0391yuan 0.0180 Beijing 0.0089 $ 0.0058donation 0.0052 ,XINHUA News , Normalized Strength,18,Theme Evolution Graph: KDD,T,SVM 0.007criteria 0.007classifica tion 0.006linear 0.005 ,decision 0.006tree 0.006classifier 0.005c
18、lass 0.005Bayes 0.005 ,Classifica - tion 0.015text 0.013unlabeled 0.012document 0.008labeled 0.008learning 0.007 ,Informa - tion 0.012web 0.010social 0.008retrieval 0.007distance 0.005networks 0.004 , ,1999,web 0.009classifica tion 0.007features0.006topic 0.005,mixture 0.005random 0.006cluster 0.006clustering 0.005 variables 0.005,topic 0.010mixture 0.008LDA 0.006 semantic 0.005 ,2000,2001,2002,2003,2004,19,Theme Life Cycles: KDD,Global Themes life cycles of KDD Abstracts,20,Su
温馨提示
- 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
- 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
- 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
- 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
- 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
- 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
- 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。
最新文档
- 家居建材团购链家居间协议
- 芯片半导体制造基础知识
- 端午节国防教育
- 艺术培训合同:演员技能提升与演出合作
- 西城区历史文化名城保护工程合同协议
- 2024涟源市创成科技职业学校工作人员招聘考试及答案
- 2024河南省经济技术中等职业学校工作人员招聘考试及答案
- 2024河北省成安县综合职业技术学校工作人员招聘考试及答案
- 脑卒中个案护理汇报
- 特定渔船股权转让合同
- 2025届贵州省安顺市高三二模语文试题
- 2025中国海洋大学辅导员考试题库
- 新疆维吾尔自治区普通高职(专科)单招政策解读与报名课件
- 2024年昆明渝润水务有限公司招聘考试真题
- 2025-2030中国小武器和轻武器行业市场发展趋势与前景展望战略研究报告
- 高中主题班会 高考励志冲刺主题班会课件
- 高三复习:2025年高中化学模拟试题及答案
- 老旧街区改造项目可行性研究报告
- 物理讲义纳米光子学
- 保洁服务礼仪培训(共55张)课件
- 中考英语写作指导课件(共41张PPT)
评论
0/150
提交评论