美赛:1235队C题---数模英文论文_第1页
美赛:1235队C题---数模英文论文_第2页
美赛:1235队C题---数模英文论文_第3页
美赛:1235队C题---数模英文论文_第4页
美赛:1235队C题---数模英文论文_第5页
免费预览已结束,剩余1页可下载查看

下载本文档

版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领

文档简介

1、第四届“认证杯”数学中国数学建模国际赛承诺书我们仔细阅读了第四届“认证杯”数学中国数学建模国际赛的竞赛规则。我们完全明白,在竞赛开始后参赛队员不能以任何方式(包括电话、电子邮 件、网上咨询等)与队外的任何人(包括指导教师)研究、讨论与赛题有关的问 题。我们知道,抄袭别人的成果是违反竞赛规则的如果引用别人的成果或其他 公开的资料(包括网上査到的资料),必须按照规定的参考文献的表述方式在正 文引用处和参考文献中明确列出。我们郑重承诺,严格遵守竞赛规则,以保证竞赛的公正、公平性。如有违反 竞赛规则的行为,我们将受到严肃处理。我们允许数学中国网站()公布论文,以供网友之间学习交流, 数学中国网站以非商

2、业目的的论文交流不需要提前取得我们的同意。我们的参赛队号为:1235我们选择的题目是:C题参赛队员(签名):队员1:王东全队员2:吴卓其队员3:周洋参赛队教练员(签名):杨剑波Team # 1235Page 6 of 21第四届“认证杯”数学中数学建模国际赛编号专用页参赛队伍的参赛队号:(请各个参赛队提前填写好):1235竞赛统一编号(由竞赛组委会送至评委团前编号):竞赛评阅编号(由竞赛评委团评阅前进行编号):Using Data Mining Techniques for Detecting Terror-Related Activities on the WebAbstract:The n

3、umber of terror attacks is increasing year by year. On November 13, 2015, the terrorist attack that took place in Paris caused hundreds of deaths The hazards of cyber terrorism have already become more and more serious. The USA has enacted a number of laws aimed at the prevention of cyber terrorism,

4、 such as USA PATRIOT Act”. It is necessary to establish a model for the prevention of terrorist network spread and to monitor and find the people with a tendency to terrorism. The Internet behavior analysis and risk assessment model (IBARA) was established for the Internet to assess the internet beh

5、aviors of those people who are monitored. In this paper, based on IBARA. we not only research the relationship between peoples Internet behavior and their possible terrorist tendency, but also analyze and discuss the relative quantitative risk index of individual terrorism tendency and the relevant

6、strategies to prevent terrorist attacks.Firstly, the Internet behavior was divided into two parts: Web text and image The complex vector space of word frequency analysis algorithm was adopted to establish the personal tendency of terrorism risk index sub module (PTTRISM) which can predict peoples te

7、ndency to terrorism. In PTTRISM. this paper analyzes the behavior of individual Web text using the keyword extraction technique and frequency analysis technique According to the analysis results, ifs given the value of the risk index of individual terrorism in this paper. Using the PTTRISM to analyz

8、e the data sample, we had drawn a conclusion that most people who have been access to the terrorism-related information are not likely to become potential terrorists. The PTTRISM could calculate peoples risk index about the tendency to terrorism through analyzing Internet behaviorSecondly, in fact,

9、the object of network monitoring is not a person but a large number of people, which makes to monitoring data too large and complex. In order to facilitate the rapid and efficient classification and analysis of big data, a big data clustering statistics sub module (MDCSSM) is established based on th

10、e technique of density-based clustering At the same time, in order to shorten the computing time of the MDCSSM. in this paper is adopted the standard particle swarm optimization (PSO) with the weight-shrink factor. It realized the effective, fast and automatic clustering analysis of datasets. Valida

11、tion of the sub model using the data, The model can be used to analyze a large amount of data Due to sacristy of the monitoring data, we utilize some frequently-tested public datasets, Iris”, Glass”,Wine” and Aggregation” to replace the monitoring data and verify the clustering algorithm. The cluste

12、ring results demonstrate that the clustering algorithm can categorize the monitoring datasets in an effective, fast and automatic manner.Finally, Wc propose some suggestions to President Obama about fighting against terrorism as follows based on IBARA :1. Put into more resources in terms of network

13、against terrorism. You could build User Online Monitoring System of Behavior and Psychological to monitor and assess the behavior of the public2. Establish Information security evaluation system to weaken and even prevent the terrorist propaganda through the network.3. Strengthen public anti-terrori

14、sm education, raise public awareness of anti-terrorism.Due to the time constraints, the model still has some defects which need to be improved. In the PTTRI sub module, factors of voice and image files are not considered. In the MDCS sub module, the selection of adaptive function in Clustering analy

15、sis could be further improved With the further improvement of the model, we will get more accurate results.Key words: PSO, word frequency analysis algorithm, density-based clusterings terrorism, Internet behaviorContentsI. Introduction5II. The Description of the Problem52.1 Our Approximation the Who

16、le Course of Data Mining To terrorists onwebsite 52.2 The Differences in Weights and Sizes of Available Data.6III. IBARA63.1 PTTRISM63.1.1 Terms, Definitions and Symbols in PTTRISM 63.1.2 Assumptions in PTTRISM 633 The Model of Terrorism-Related Website Browsing and Vector SpaceModels of Lexical Mea

17、ning.73.1.4 The Model of Risk Index 83.1.5 Solutions and Results for PTTRISM93.1.6 Strength and Weakness in PTTRISM113.2 MDCSSM123.2 1 Extra Symbols123.2.2 Additional Assumptions12323 The Foundation of MDCSSMto Categorize Big Data123.2.4 The Results of MDCSSM15325 Strength and Weakness18IV. Conclusi

18、ons194.1 Conclusions of the Problems194.2 Methods Used in our Models194.3 Applications of our Models19V. Proposal to Fighting Terrorism201 5I. IntroductionIn order to indicate the origin of web-related terrorism problems, the following background is worth mentioningTerrorist cells are using the Inte

19、rnet infrastructure to exchange information and recnjit new members and supporters 口(Lemos 2002: Kelley 2002). For example, high-speed Internet connections were used intensively by members of the infamous Hamburg Cell,that was largely responsible for the preparation of the September 11 attacks again

20、st the United States內(Corbin 2002). This is one reason for the major effort made by law enforcement agencies around the world in gathering information from the Web about terror-related activities. It is believed that the detection of terrorists on the Web might prevent further terrorist attacks11 (K

21、elley 2002). One way to detect terrorist activity on the Web is to eavesdrop on all traffic of Web sites associated with terrorist organizations in order to detect the accessing users based on their IP address Unfortunately it is difficult to monitor terrorist sites引(such as Azzam Publications5 (Cor

22、bin 2002) since they do not use fixed IP addresses and URLs. The geographical locations of Web servers hosting those sites also change frequently in order to prevent successful eavesdropping. To overcome this problem, law enforcement agencies are trying to detect terrorists by monitoring all ISPs tr

23、affic14 (Ingram 2001), though privacy issues raised still prevent relevant laws from being enforced.7000-6000 -5000 -4000 -30()0 -2000-1000 -O -I960 19651970 1975 1980 1985 1990 1995 2000 2005 2010 2015yearFigure l: the annual number of terrorists attack from 1968 to 2009II. The Description of the P

24、roblem2.1 Our Approximation the Whole Course of Data MiningTo terrorists on websites How often does the internet user who is monitored visit the website that contains terrorized information and propaganda of terrorism The lexical meaning of contents of their emails, chats, post views and text files

25、being downloaded As for other formats of files, such as videos, images and audios, the techniques of the image description and voice recognition are used as a tool to detect the terrorists. For categorizing the monitoring data, die cluster techniques are adopted to sect data in an effective, fast an

26、d automatic manner. Present some useful suggestions to President Obama for fighting terrorism2.2 The Differences in Weights and Sizes of Available DataDue to differences between the collected datasets, its quite necessary to preprocess the available data, Such as text datasets, numerical datasets, i

27、mage datasets and even voice datasets1) The Preprocess of Text Data: remove non-alphabetical characters from the text dataset and put them into MATLAB cell structures2) The Preprocess of Image Data: remove non-imagery information from the image datasets and convert the RGB images into the gray-value

28、 images. If the image datasets are polluted by noises, its quite necessary to denoise image before analyzing the relevant information.3) The Preprocess of Voice: if the audio datasets are polluted by noises, it5s a need to implement audio-denoising steps before digging out the auditory information.4) The Preprocess of Numerical Dataset: Due to existence of differences between data samples in units and magnitudes, the numerical dataset needs to be normalized and standardize

温馨提示

  • 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
  • 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
  • 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
  • 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
  • 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
  • 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
  • 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

评论

0/150

提交评论