翻译以.原文和在同一文件中前_第1页
翻译以.原文和在同一文件中前_第2页
翻译以.原文和在同一文件中前_第3页
翻译以.原文和在同一文件中前_第4页
翻译以.原文和在同一文件中前_第5页
免费预览已结束,剩余13页可下载查看

下载本文档

版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领

文档简介

:随着Web2.0的快速发展,越来越多的人开始在互联网上公开私人信息以及发实时的发布消息以及交互。作为数量急速增加的结果,大量的信息和夹杂着情感复杂的数据发布在这个平台上,的研究受到越来越多的关注,尤其是对情感分析的问题的三种对于倾向性判断的方法,并对这几种方法的精确度:引随着Web2.0应用的扩散,社交开始了。除了线上阅读,人们还需要共享思想和展示自我,在网上社交生活中[1-3]发出自己。基于这些共同的需求,各种、社交把握住这一良好的契机和机遇迅速 )的统计,对于客的持续总次数远超过其它社交所占的百分比。因此,不可否认是作为互联网上的一种新的信息来源,“”是供大家信息的空间。通实际上它们存在着巨大的潜在,比如对博客的研究[4],预测票房和根据 语级[10]。另一种粒度层次还有文档级[11-12]然而,在过去的研究中,都只关注于对西方,的探究,而没有人 近些年人开始留心与字符的分许以及用户的行为。在本文中,我们的目的是探究和比较()的情感分类的体现。尤过分析的情感这一过程中,对朴素,LibSVMSMO 改进文本分类模型,以得到一个更好的解决情感分析的方案;理论背语级[10]。另一种粒度层次还有文档级[11-12]性,而对于文章的情统计文本中的情感词组,再比较普通的情感词得分来确定情感倾向。Turney在2002年写的推进了这一领域的研究。他们从 银行,,旅游目的地的评论作为实验数据。该实验分为三个步骤:确定情感朴素,SVM,以及某种优化的SMO[12,21-23]。给定一个文档d,以及项t,则有:dP(c|d)=P(c)∏1≤k≤n d c

log(P(c)+∑1≤k≤n d而且在性探测中更加具体[3,25],因此这个方法是一个强有力的竞争对手。在我们的实验中,我们设置的参数[3,25]与缺省设置一样。我们使SMO,即序列最小优化方法,是训练SVM[26]的一种优化算法。SMO概念简许多针对Tweets的研究都展现于 Sentiment和TweetFeel,这三个情感分析的网页应用来初始化情感标识,然后根据预先定下的规则来预处理tweets信息,最终使用预处理过的带有情感的tweets信息作为训练数据[27]。论是:积极的观点影响要超过联系,而且性强。𝑝𝑜𝑙𝑝𝑜𝑠(𝑤)=𝑐𝑜𝑢𝑛𝑡(𝑤. 𝑝𝑜𝑙𝑛𝑒𝑔(𝑤)=1− 表明了,只要我们的特征能够捕获的对于tweets的抽象表示,相比于其它(1)只能包含140个字,但其中的信息量比tweets上多的多。假如我(2)在中文信息中,在无的信息中哪个特征更加有用研究方由于中国的文化和特点,包含好几个中没有的功能。其中一些特征我们应该考虑到:字数限制,社交的反馈,多样化,微话题,账户验同的收集数据集:测试数据,训练数据以及客观(中性)训练数据。提供了一个寄存器来用户信息并包括如,@用户名,URL,以及图像的直接插入功能。用户可以发布自本文数据是通过提供的接口(API在2012年9月17日到11月3日间收集的。我按照时间轴,捕获了所有在这段时间内发布信息以及每由于每个IP的最大请求数是150,而且3600/150=24,为了释放一些请求以应对突况,所以我选择了在每隔25秒就发送一次请求以获取的公共数时会有一些空数据最终我们抓取了634359条 有显著特征信息。如表1所示,它描述收集数据中的正面,中性,全是手工标注将其分类。从本质上来说,的情感分析问题就是一个分类问题。受到Jiang在向,并且还包括一些数据的压缩工作;进而提取与话题无关的特征,以SVM分在流程图(图3.1)中,显然该算法的是训练SVM方法,同时,如何精确的分类的输入向量,如图3.2.用SVM模型将文本转换成向量集文本,并计算每个的权值。器。SVM的特征格式为: ,指数值,如表3.1.实验与结

其实,精确率和率可以告诉我们关于分类器某些方面的性能。精确率展80%20%对极性和非极性的文本分类时更加稳定,所以SMOLibsvm更好。结论和展

在我们的实验中对于三种分类(,正面,中性)最高的准确率为90.03%。尽管这个数值是由SMO得到的,但是我们可以清楚地看到,朴素,LibSVM器在实际测试中得到的结果比其它分类器要好。而贡献则是:对于的数 性)和测试数据都是通过不同的API从上收集而来的。从 取来的都是带有大量和用户信息的XML文本,我编写了一段Perl代码来抓部分中有着大量的用上的中文词语数据进行的实验。本次研究表明,微错的结果。通过这些观察,朴素,SVM,SMO析中。当然,也有极少部分对于领域的这些模型的研究;但是很明显,鲜有参考文[1]KwakH,LeeC,ParkH,etal.Whatis,asocialnetworkoranewsmedia?[C]//Proceedingsofthe19thInternationalConferenceonWorldWideWeb.ACM,2010:591-600.[2]BPang,LLee.Opinionminingandsentimentysis.NowPublishersInc,2008.[3]BPang,LLee,SVaithyanathan.Thumbsup?:sentimentclassificationusingmachinelearningtechniques//ProceedingsoftheACL-02ConferenceonEmpiricalMethodsinNaturalLanguageProcessing.Stroudsburg,PA,USA,2002,10:79-86.[4]MBautin,LVijayarenu,SSkiena.Internationalsentimentysisfornewsandblogs//ProceedingsoftheInternationalConferenceonWeblogsandSocialMedia(ICWSM),2008.[5]TurneyPD.Thumbsuporthumbsdown?Semanticorientationappliedtounsupervisedclassificationofreviews[C]//Proceedingsofthe40thforComputationalLinguistics,2002:417-424.[6]PTurney.Measuringpraiseandcriticism:Inferenceofsemanticorientationfromassociation[J].ACMTransactionsonInformationSystems,2003,21(4):315-346.GhoseA,IpeirotisPG,SundararajanA.Opitionminingusingeconometrics:Acasestudyonreputationsystems[C]//Proceedingsofthe45thAnnualMeetingoftheAssociationforComputationalLinguistics(ACL).Morristown,NJ,USA:AssociationforComputationalLinguistics,2007:416-423.fortopicsinChinesesentence[J].JournalofChineseInformationProcessing,2007,21(5):73-79.YuH,HatzivassiloglouV.Towardsansweringopinionquestions:Separatingfactsfromopinionsandidentifyingthepolarityofopinionsentences[C]//Proceedingsofthe2003conferenceonEmpiricalmethodsinnaturallanguageprocessing.AssociationforComputationalLinguistics,2003:129-136.RiloffE,WiebeJ,WilsonT.Learningsubjectivenounsusingextractionpatternbootstrap.Proceedingsofthe7thConferenceonNaturalLanguageLearning,2003:25-32.SindhwaniV,MelvilleP.wordco-regularizationforsemi-supervisedsentimentysis//EighthIEEEInternationalConferenceonDataMining,2008.PangB,LeeL.Asentimentaleducation:Sentimentysisusingsubjectivitysummarizationbasedonminimumcuts//ProceedingsoftheACL,2004:271-278.design:asemanticsimilaritymatchingapproach[C]//PlanningtoLearnWorkshop(PlanLearn’10)atECAI,2010:27-34.GuoZ,LiZ,TuH.SinaMicroblog:Aninformationdrivenonlinesocialnetwork[C]//Cyberworlds(CW),2011InternationalConferenceon.IEEE,2011:160-167.YuL,AsurS,HubermanBA.WhattrendsinChinesesocialmedia[J].arXivpreprintarXiv:1107.3522,2011.[C]//WaveletActiveMediaTechnologyandInformationProcessing(ICWAMTIP),2012InternationalConferenceonIEEE,2012:385-389.GaoQ,AbelF,HoubenGJ,etal.Acomparativestudyofusers’microbloggingbehavioronsinaweiboand[J].UserModeling,Adaptation,andalization,2012:88-101.characteristicsofmicroblogusers:Take“SinaWeibo”forexample[J].LibraryandInformationService,2010,54(14):66-70.(inChinese)[19]Zi-qiongZ,Yi-junLI,QiangYE,etal.SentimentclassificationforChineseproductreviewsusinganunsupervisedinternet-basedmethod[C]//ManagementScienceandEngineering,2008.ICMSE2008.15thAnnualConferenceProceedings,InternationalConferenceonIEEE,2008:3-9.[20]PotenaD,DiamantiniC.Miningopinionsonthebasisoftheiraffectivity[C]//CollaborativeTechnologiesandSystems(CTS),2010InternationalSymposiumon.IEEE,2010:245-254.[21]DasguptaS,NgV.Minetheeasy,classifythehard:Asemi-supervisedapproachtoautomaticsentimentclassification[C]//ProceedingsoftheJointConferenceofthe47thAnnualMeetingoftheACLandthe4thInternationalJointConferenceonNaturalLanguageProcessingofthepreprintarXiv:1107.3522,2011.KeerthiSS,ShevadeSK,BhattacharyyaC,etal.ImprovementstoPlatt’sSMOalgorithmforSVMclassifierdesign[J].NeuralComputation,2001,13(3):637-649.PlattJ.Sequentialminimaloptimization:Afastalgorithmfortrainingsupportvectormachines[J],1998.EsuliA,SebastianiF.Sentiwordnet:Apubliclyavailablelexicalresourceforopinionmining[C]//ProceedingsofLREC,2006,6:417-422.PlattJC.12FastTrainingofsupportvectormachinesusingsequentialminimaloptimization[J].1999.BarbosaL,FengJ.RobustsentimentdetectiononfrombiasedComputationalLinguistics:Posters.AssociationforComputationalLinguistics,2010:36-44.KaplanA,HaenleinM.Usersoftheworld,unite.TheChallengesandOpportunitiesofSocialMediaBusinessHorizons,2010,53(1):59-68.CookT,HopkinsL.Socialmediaor“Howwestoppedworryingandlearnttolovecommunication”Yourorganizationandweb2.0(3rded),e-book.RetrievedMarch28,2008,frombuildyourbusiness.Hoboken,NJ:JohnWiley&Sons,2007.20132013InternationalConferenceonManagementScience&Engineering(20th)July17-19, Harbin, ysisofSinaWeiboBasedonSemanticSentimentSpaceHUANG:WiththerapiddevelopmentofWeb2.0,moreandmorepeoplebegintopublishinformationortheircustomopinionsontheInternet.Micro-blog’sapplicationsatisfiespeople’sneedandprovidesapublicplatformforpeopletopostandinteractinrealtime.Asaresultoftherapidlyincreasingnumberofmicro-blogupdates,alotofinformationandemotionscomplexdatareleaseinthisplatform,researchesonmicro-bloghaveattractedmoreandmoreattention,especially,onecontinuousheattopic,sentimentysisofshortmessage.Sofar,Chinesemicro-blogexplorationstillneedslotsoffurtherwork.FocusonSinaWeibo’ssentimentysis,thekeyofthispaperistoputforwardthreemethodsofMicro-BlogorientationclassificationtoresolvetheproblemofMicro-Blogsentimentysis,andcomparetheaccuracyandperformanceofeachclassification:sinaweibo,sentimentysis,machinelearning,featureextractionWiththeproliferationsofWeb2.0applications,socialmediarevolutioncomes.Exceptforreadingonlinenews,peoplealsoneedtoshareconsiderationsandshowthemselves,expresstheirvoicesintheonlinesociallife[1-3].Basedonthesedemandsinthecommonsense,variouskindsofforums,socialmediawebsitesriseinresponsetothepropertimeandconditionsinChina.ThroughtheneweststatisticsfromChinaInternetdataplatform ),thetotalfrequencyanddurationofvisitsinMicro-blogfarexceedthepercentageofSNSwebsites.SothereisnodenythatMicro-blogisuniqueplatformcombininginformationpublishesandsocialnetworkperfectly.AsanewinformationresourceonInternet,“microblog”isthespaceforeveryonetoshareinformation.Generally,theseviewsfullofcustomemotionsexpressauthors’positiveandnegativeopinions.Itseemsthattheinformationiscomplexanduseless,actuallytheyexistmanypotentialcommercialvalue,suchasblogstudies[4],forecastbox-officesalesandupdatingproductswith transactiondata[7],distinguishattributeandsentimentstructurethroughChinesecarindustry

contributestotext ,questionandanswersystemSentimentysisoropinionmining,asitissometimescalled,isoneofmanyareasofcomputationalstudiesthatdealwithopinion-orientednaturallanguageprocessing.Weperformsentimentysisonmicro-bloginwhichasinglemessagetypicallyconsistsofoneortwosentencesfewerthan140words.Supportedbythisobservation,thetypeofgranularitywestudyisthesentenceandword[9]orphrase[10]level.Othergranularitylevelscanbethe level[11-12].Thelevelofdetailtypicallygoesintodeterminingthepolarityofamessage,whichiswhatthisarticleinvestigatesaswell.Amoredetailedapproachcouldbetodeterminetheemotionexpressedinadditiontothepolarity[13].However,onlywesternmicro-blog, ,havebeenexaminedinthepaststudies,andthereisnotmuchresearchfocusonsentimentysisofChinesemicro-blogarea,andtoourknowledge,amongresearchofmicro-blog,itbeginswithbasicintroductionandprediction[14-15],andinrecentyearsmoreaboutysis[16]andusermicro-blogbehaviors[17-Inthispaper,weaimatexploringandcomparingtheperformanceofsentimentclassificationforSinaWeibo(Weibo).Especially,theinterestisinthesentimentofWeibopostsbyusersaboutpeople’sstatus.ThroughyzingthesentimentofWeibo,thispapermakesacomprehensivecomparison,whichtakesNaïveBayes,LibSVM,andSMOmodelsintoThisarticleselectscommonWeibomessagesalongtimelinetobethestudyobject,andbeginsfromthefollowingrespects.ViarecognizingthecharacteristicsofWeibomessages,thispaperwillimprovefeatureselectionapproach,findoutthesetsuitableformicro-blogitself,buildmicro-blogsentimentspacemodel;Onthebasisofusingmachinelearningtotextclassification,thispaperwilladduppositiveandnegativesentimentdictionarymaterials,updatetextclassificationmodel,getasolutionforsupportingmicro-blogsentimentysisbetter.DevelopasentimentysisprototypefocusonChinesemicro-blog,throughSinaAPItoobtaindata,examinetheaccuracyandviabilityoftestingsamplewithdifferentsentimentclassifications.-206EnglishwordssentimentSentimentysisoropinionmining,asitissometimescalled,isoneofmanyareasofcomputationalstudiesthatdealwithopinion-orientednaturallanguageprocessing.Weperformsentimentysisonmicro-bloginwhichasinglemessagetypicallyconsistsofoneortwosentencesfewerthan140words.Supportedbythisobservation,thetypeofgranularitywestudyisthesentenceandword[9]orphrase[10]level.Othergranularitylevelscanbethe level[11-12].Thelevelofdetailtypicallygoesintodeterminingthepolarityofamessage,whichiswhatthisarticleinvestigatesaswell.Amoredetailedapproachcouldbetodeterminetheemotionexpressed[20]inadditiontothepolarity.Eventhoughthesentiment ysisofChinesetestisimmature,itisawell-developedtechniqueofEnglishwordsSentiment ysis.Referringtonocentretopic,itmeanstojustutilizeone orsentencetojudgeitssentimentpolarity.Therearethreeclassesclassificationsinthis,methodsbasedondictionary,supervisedmachinelearningmethods,andunsupervisedmachinelearningmethods.Unsupervisedmachinelearningmethodsuseappointedbasicsentimentwords,calculatethesentimentphasesexactedfromtexts,andthencomparewithnormalsentimentwordsscoretodeterminesentiment.Turney,in2002wroteonepaper[5]contributingtothisresearch.Theygetreviewsofmobile,bank,movie,traveldestinationfrom asexperimentdata.Theexperimentisathree-stepprocess:exactsentimentphrases,estimatethetendentiousnessofexactiontwowordsphrases,figureaveragesemanticstendentiousnessofeachreview.Heresupervisedmachinelearningmethodsshouldbedeeplydiscussed.ThismethodmainlyreferstoNaïveBayes,SVM,andsomeoptimizedSMO.[12,21-23]TheNaïveBayesmethodforclassificationisoftenusedintextclassificationduetoitsspeedandsimplicity.Itmakestheassumptionthatwords(ork-grams)are

Inourexperimentsweusethesameparametersettingsas[3,25],whousedthedefaultsettings.WeusethesamefeaturespacesasforNaiveBayes,usingtokens,tags,acombinationofbothorpatternsasfeatures.(3)SequentialMinimalOptimization,orSMO,thisisanoptimizedalgorithmfortrainingSVM[26].SMOisconceptuallysimple,easytoimplement,isoftenfaster,andhasbetterscalingpropertiesthanastandard“chunking”algorithmthatusesprojectedconjugateSMOchoosestosolvethesmallestpossibleoptimizationproblemateverystepratherthanpreviousNextchapter,wewilldeeputilizethesemodelsintheEnglishmicro-blogsentimentManystudiesareforTweets,postedin Theseresearchesdividedintotwoaspects,micro-blogsentimentysiswithouttopic,andmicro-blogEnglishsentimentwithspecifictopic.Referringtomicro-blogsentimentysiswithouttopic,scholarsusethehashtagandsysofTweetsastag,trainasupervisedKNNclassificationAnotherarticleinthis usesthesentimentysisapplicationsofthreewebsites,Twendz,SentimentandTweetFeeltogettheinitialsentimenttag,andthenpreprocesstweetsaccordingtorulesestablished,andfinallyusingthepreprocessedtweetswithsentimenttagastrainingdata[27].Thefirststepis,classifyobjectiveandsubjectivewithexactionfeaturetrainingclassification.Theyexacttwoclassesoffeatures:Meta-informationofthewordsFeaturesandTweetsrelativesyntaxfeatures.Theresultconcludesonthebasisoftheirinfluenceextent:positivesentimentpolarityaffectsmorethanlinks,andthenquitestrongsubjectivity,uppercases,verbs.Thesecondstepistoclassifysentimentpolarityutilizingchangingwordsinsamefeatures.Inthisstep,authorsuseformula2-2andformula2-3tocorrectthepolarityofsentimentwords.generatedindependentlyofwordposition.Foragiven polposwcountw,pos/countw setofclasses,itestimatestheprobabilityofaclass, given ,d,withterms,t,k k

Besides,theystillusethesamefeaturesinthefirststeptotrainclassification.Intheirexperiments,theyTheclassifierthenreturnstheclasswiththehighestprobabilitygiventhe .Inpractice,thelogprobabilityisestimated,givenby:

showedthatsinceourfeaturesareabletocaptureamorerepresentationoftweets,oursolutionismoreeffectivethanpreviousonesandalsomorerobust argmaxlogPˆcPt 1k

regardingbiasedandnoisydata,whichisthekindofdataprovidedbythesesources.TheresultstatesbasedonThepriorclassprobabilityisgivenbythefractionofappearancesofthatclassinthetrainingset.(2)SimilartoNaiveBayes,SVMapproachesoftenshowverypromisingresultsintraditionaltextcategorization[24],andmorespecificallyinsubjectivitydetection[3,25],this

approachishenceadirectorderofthefunctioninfluence,thelistsare:negativesentimentpolarity>positivesentimentpolarity>verbs>emoticonpresentspositive>theuppercases.Assofar,therearestilllotsofworkshouldbe

inChinesemicro-blogsentimentysis,sinceithasabigdifferencewithbothEnglishmicro-blogandtraditionalblog,soitstillhasmountainsofresearch-207SinceWeiboonlycontains140words,theamountofinformationismuchmorethantweets.Ifwedoitindifferentway,oneislookingasonemessage,anotheroneislookingasseveraldividedsentences,whethertheresultsaredifferentornot.IntheChineseWeibomessages,whichfeaturesaremoreusefulintheno-thememessages?ResearchBecauseoftheculturesandcharacteristicsofChina,SinaWeiboincludesseveralfunctionsthatarenotincludedon.Somefeaturesshouldbeconsidered,limitationofwords,convenientsocialfeedback,richmedia,microtopics,verifiedaccount,andself-censorship.SowegetridofsomeWeibowithlinkstoguaranteewordkitprecision.Datacollectionfortheresearchisnotassimpleasitmayseematfirstthought.Thereareassumptionsanddecisionstobemade.Therearethreedifferentlycollecteddatasets:testdata,subjectivetrainingdata,andobjective(neutral)trainingdata.SinaWeibo,itprovidestheregisterauserprofileandcontainsfunctionslikerepost,@usernames,hashtag(#),privateinstantchat,URLshortening,anddirectinsertionswithgraphics.Userscanposttheirownupdates,followtheirfavoriteWeiboaccounts,createeventwithhashtags,reposttheirconcernedmessages,andinteractwithothersviacomments.TheWeibodatainthispaperwascollectedfrom17thSepto3rdNovin2012,basedontheApplicationProgrammingInterfaces(APIs)providedbyWeibo.IcapturedalltheWeibopostedorderedbythetimelineduringthisperiod,andtheprofileofeachWeibo’suser,e.g.thenumberoffollowersandfollowings,andthegenderandtheprovinceofusers.SinceeachIP’s umrequestcountsis150,and3600/150=24,inordertofreeoutsomerequestcountsforunexpectedsituation,soIchooseevery25secondsinawhile,sendarequesttoacquirethenewestpublictimelinelists.Thisrequestreturnsthenewestposted20Weibo.ButbecauseofnetworkandSinaserverproblems,therearesomenulldata.Asasummary,634,359WeiboandrelativedetailinformationanduserinformationhadbeenAsmentionedabove,eventhoughthispapercollectedmountainsofWeibo,itcanbeusedonlyaftermanualtaggingitsclass,positive,neutral,negative.Thisisatoughtask,whichisimpossibletoclassifyalltheWeibo.Afterpreprocessing,thispaperusestherandomclassifiedSinaWeibodatawith2071messages,603negative,287neutral,624positive,and557Weibowithnosignificantwhenrandom

Jiang,etc,2011,thisarticledesignedthealgorithmprototypeasFig.1.FromFig.1,itstatesgeneraltrainsofthought,firstpreprocessthetrainsample,thispartmainlyismanualworktolabelthesentimentpolarity,andalsoincludessomedatacontractionwork;andthenextractthecharacteristicshavingnothingtodowithtopic,traintheSVMclassificationtoclassifysentimentpolaritywiththetestsample,theoutputissentimenttagresult.TrainingTesting…TrainingTesting…Fig.1AlgorithmprototypedesignflowIntheflowchart(Fig.1),itisobviousthecoreofthisalgorithmusedisthetrainingmethodSVM,atthesametime,howtoexactcharacteristicsisalsothekeypart.Afterpreprocessing,thetextshavealreadyseparatedandmarkedthewordswiththeirpartofspeech,characteristiccalculateprototypebasedonSVMistocalculatethefeaturevectorofeachtestthroughcharacteristicextraction,theoutputofthispartistheinputvectorofclassification,likeFig.2.Index:Index:Featuresetdw,wtdtItdescribesthenumberofpositive,neutral,negativecollectedWeibo,showninTab.1.Allthepolarityisseparatedbymanualwork.Essentially,thesentimentysisproblemofSinaWeiboisaclassifyissue.Inspiredbytheresearchfrom

Fig.2FeatureextractionflowInthechart,Indexpresentsfeatures,Valuepresentsfeatureweight.Thismeansreferringtothefeatureset-208usingSVMmodeltotransferWeibotextsettotextvectorset,andcalculateeachWeibo’sweight.SVMclassificationisthekeymethodoftheclassificationprototype;inthispaperIuseopensourceSVMclassification.ThefeaturesformatofSVM:labelindexvalue,likeTab.2.0123456789………………Afterthetrainingprocess,eachWeibowillgetafeaturevectorpresentation;itlikesthesampleshowninWeibo:Weibo:土狗老师你好我又 310:0.359810351:0.141443476:0.359810477:0.282574Fig.3SVMinputdataExperimentsandThepurposeofexperimentinlastchapteristotestandverifythecapabilityofSVMclassificationindealing Commonlyindexesusedtoevaluatetheperformanceofclassificationareaccuracy,precision,recallandF1measure.FromTab.3,accuracy,precision,recallandFmeasurearecomputedasfollowsinformulas(5-8).

oftheclassifierwithrespecttoeachclassandrecalllsthecompletenessoftheclassifierwithrespecttoeachclass.Recallenablestoidentifytheclasswithrespecttowhichtheclassifierishavingdifficultypredictingandtousethisinfototiptheclassifierinfavorofthatclass.101ActualPos.0Accuracyisthemeasurebywhichalltheresultsoftheabovealgorithmswerecompared.Erroristheotherwayoftalkingaboutit.Soifanalgorithmhas80%accuracy,itmeansithas20%error.Belowisaresultfromseveralclassificationsontrainingdataandtestdata.Thewhole1514usefuldatawillbecutinfourfolds,inotherwords,threequarterofthemwillbetrainingdata,andtheleftdatawillbethetestdata.Theaccuracyisonthetestdataandthisistheclassificationatwhichthehighestaccuracywasachieved.Basedontheprocessedfeaturevalueofthewholetextset,nextwewillyzetheresultsofeachAtfirst,wetesttheaccuracyofthreemethodsunderthesamedataset.TheresultstatesinFig.4.Fig.4ExperimentOnthewhole,NaïveBayesisgoodataccuracy TPTPFPFNprecision TP

polarwords,butisincapacitytodiscernneutraltexts,andthesameasLibSVM.TheperformanceofSMOisrelativestable,inotherwords,itcanclassifythreeclasseswell.Buttheaverageaccuracy,SMOisbetterthanLibSVM,andNaïveBayes,itisalsoshowedinrecall TP

Tab.4.ItshowstheresultsofthreeclassifierswhenalmosteveryparameterachievesrelativelyhigherscoreF12precisionrecallprecisionrecall

thanthebaseline,whichmeansclassificationmodels,areefficientforWeibo.InTab.4,wecompareHowever,precisionandrecallcan lusaboutsomeaspectsofaclassifier.Precision lstheexactness

effectivenessofthreemethods,NaïveBayes,LibSVM,andSMOinfurtherdetail.Inoursample,the-209resultofNaiveBayesclassificationindicatesthatthismethodcaneffectivelyclassifypolaritytexts,buttheperformanceinneutraltextsisworsethanLibSVMandsmo.eventhoughtheaccuracyoflibsvmandsmoisalmostthesame,smoisbetterthanlibsvm,sincesmoisshowntobefast,fairlyaccurate,andmuchstableinpolarityandnon-polaritytextsintheexperimentresult.ConclusionsandfutureThispapersetsouttosolveapracticalproblemofsentimentysisofSinaWeibopostssortedoftimeline.ComparedtosentimentclassificationforChinesetraditionalreviews,thisstudyexploresthefeasibilityofclassificationsonshortmessages,Weibo.Toconclude,thearticlehasshownthattextsinChinaWeibopostsplatformcanbeautomaticallycollectedandsuccessfullyyzedfortheirsentiment.SMOclassifierwasfoundtogivethehighestaccuracywithnotopicWeibosample.Thehighestaccuracyachievedforathree-classed(negative,positive,neutral)classifieris90.03%inourexperiment.EventhoughthisscoreisgotfromSMO,butthroughtheresults,wecanclearlyseethatNaïveBayes,LibSVMindeeddosomecontributeonpolarclasses.Inourexperiment,SMOcanbeappliedforpracticalapplicationsdealingwithsentimentysisofWeiboingeneral.Thisthesishasmadesomeconfirmationsofpreviousfindingandthreemainnovelcontributions.TheconfirmationsarethatintheChineseWeibotextsmachinelearningtechniquesoutperformkeyword-basedtechniquesandthatSMOclassifiergivesbetterresultsthanotherinstancerepresentations.Thecontributionsare:datacollectionofSinaWeibodataanddatapreprocessing,empiricalstudyoftheroleofChinese-wordcontextinsentimentysisofWeibosortedbytimelinewithouttopic,andcomparesentimentclassificationmodels.Alltrainingdata(negative,positive,neutral)andtestdatawerecollectedfromSinaWeibousingdifferentAPIs.FetcheddatafromWeibowebsiteareallxml swithamountsofWeiboandusersinformation,IcompileapieceofPerlcodetoexactexperimentdata,andgeneratemountainsoftextfilesautomatically.Supportedbythetheoryconception,themainpartofthisthesisisanempiricalexperimentusingSinaWeiboChinese-worddatatoevaluatetheperformanceofseveralsentimentyses.Thisempiricalstudyindicatesthatsentimentysis

Weiboingeneralcanbedoneindependentlywithoutregardtotheircontext.Thisisthemaintheoreticalcontributionofthisthesisfortherewasnotany specificstudyofsentimentofWeibobefore.Besides,thereisanotherveryimportantcontributionisinevaluatingtheperformanceofsentimentysismodelaimedatWeiboclassifyissue.Acarefulreviewoftheliteratureonsentimentysisshowedthatthereisnoonebestfeaturevectorthatissuitedtosentimentysis.Therearesomesentimentysisstudiesthatachievedgoodresultswithtweetspresencepostson .Aftertheseobservations,NaïveBayes,SVM,andSMOareallabletoutilizeinsentimentysis.ButthereislittleresearchonWeibo withthesemodels;obviously,lessresearchmakesacomparisonoftheclassificationresultswiththesemodelsinWeibo .Soaresultscomparisonofmodelsperformanceisanothertheoreticalcontributioninthisthesis. classificationshouldbeconsidered.Weexploresomeotherwaysintextparticipleandphraseextractingtoimproveaccuracy.AndwefindoutthebestmatchtoyzethesentimentofWeiboshorttext.Furthermore,acomparisonbetweenChinesereviewsandsentimentysiscanbe[1]KwakH,LeeC,ParkH,etal.Whatis ,asocialnetworkoranewsmedia?[C]//Proceedingsofthe19th[2]BPang,LLee.Opinionminingandsentimentysis.NowPublishersInc,2008.[3]BPang,LLee,SVaithyanathan.Thumbsup?:sentimentclassificationusingmachinelearningtechniques//ProceedingsoftheACL-02ConferenceonEmpiricalMethodsinNaturalLanguageProcessing.Stroudsburg,PA,USA,2002,10:79-86.[4]MBautin,LVijayarenu,SSkiena.Internationalsentimentysisfornewsandblogs//ProceedingsoftheInternationalConferenceonWeblogsandSocialMedia(ICWSM),2008.[5]TurneyPD.Thumbsuporthumbsdown?Semanticorientationappliedtounsupervisedclassificationofreviews[C]//Proceedingsofthe40thAnnualMeetingonAssociationforComputationalLinguistics.AssociationforComputationalLinguistics,2002:417-424.-210[6]PTurney.Measuringpraiseandcriticism:Inferenceofsemanticorientationfromassociation[J].ACMTransactionsonInformationSystems,2003,21(4):315-GhoseA,IpeirotisPG,SundararajanA.Opitionminingusingeconometrics:Acasestudyonreputationsystems[C]//Proceedingsofthe45thAnnualMeetingoftheAssociationforComputationalLinguistics(ACL).Morristown,NJ,USA:AssociationforComputationalLinguistics,2007:416-423.YaoTianf

温馨提示

  • 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
  • 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
  • 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
  • 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
  • 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
  • 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
  • 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

评论

0/150

提交评论