挖掘-textmining计算机科学技术研究所_第1页
挖掘-textmining计算机科学技术研究所_第2页
挖掘-textmining计算机科学技术研究所_第3页
挖掘-textmining计算机科学技术研究所_第4页
挖掘-textmining计算机科学技术研究所_第5页
已阅读5页,还剩112页未读 继续免费阅读

下载本文档

版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领

文档简介

第十一章智能问答(QA)技杨建计算机科学技 1QueryDrivenvsAnswerDrivenInformationAccessWhatdoesLASERstandWhendidHitlerattackSoviet wefind thequestionitself,nomatterwhetherornottheanswerisactuallyprovided.CurrentinformationaccessisqueryQuestionAnsweringproposesananswerdrivenapproachtoinformation2Alternativestouserssubmitqueriescorrespondingtotheirinformationneedsystemreturns(voluminous)listoffull-lengthitistheresponsibilityoftheuserstofindtheiroriginalinformationneed,withinthereturned QuestionAnsweringusersaskfact-based,naturallanguagequestionsWhatisthehighestvolcanoinsystemreturnslistofshortUnderMountEtna,thehighestvolcanoine icinformation WhyQuestion

FromtheCaledonianStarintheMediterranean–September23,1990( OnabeautifulearlymorningtheCaledonianStarapproachesNaxos,situatedontheeastcoastofSicily.AsweanchoredandputtheZodiacsintotheseaweenjoyedthegreatscenery.UnderMountEtna,thehighestvolcanoinEurope,perchesthefabuloustownofTaormina.Thisisthegoalforourmorning.AfterashortZodiacrideweembarkedourbuseswithlocalguidesandwentupintothehillstoreachthetownofTaormina.NaxoswasthefirstGreeksettlementatSicily.SoonharborwasestablishedbutthetownwasdestroyedbyWhatcontinentisTaorminain? 输入输出:准确的questions(inplaceofkeyword-basedanswers(inplace 51950年,A.M.Turing提出 测试1990年,HughLoebner设立“Loebner悬赏 对每年一度“LoebnerPrize”比赛的冠军 $2,000 迄今为止,没有任何一个程序通过 6 用模式及关键字匹配和置换例如关键字句型模式置换规则whatmakesyouthinkI**当输入“Yesterdayyouhurtme.”输出为“WhatmakesyouthinkIhurtyou?” 7 “LoebnerPrize”ALICE有40,000多个模板,采用模式匹配的 8 9 共同特 共同特和聊器人不同的是:这类系统擅长于知基于知识库的问答 系统 1968年Feigenbaum等人于斯坦福大学建 MIT CMU Feigenbaum在1988年作了一次,当时投入运行的系统,约有2000个,分布在欧美和。 计算用户查询和FAQ知识库中问题的相似找到FAQ知识库中与用户查询最为相似的问术、关键)匹配技术、基于向量空间模型的主持开发的//计算机:“//计算机:“准是哪个不文明的游客投进去的。//计算机:“饲养员怎么 采用Agent和本体来表达知Agent强调包装和继承,重视事物之间的纵向联 专业本体是手工获取的,专业知识的在教授的率领下,语言文化大学计算机系语言信息处理进行百科词范的《百科全书》(光盘版),提取比较易于形式化的信息 性能优良,对于用 许多问题,回答准确甚至可以进行一定程度的推理计算QA:InformationStructureddataSemi-structureddata(mentfieldindatabases,XML)FreeTosearchTheFixedsetoftextcollection(e.g.Asingletext(readingcomprehensionQA:ClassificationaccordingtotheanswerFactualquestions(WhatisthelargercityOpinions(WhatistheauthorattitudeSummaries(WhataretheargumentsforandClassificationaccordingtothequestionspeechact:Yes/NOquestions(IsittruethatWHquestions(Whowasthe IndirectRequests(IwouldlikeyoutolistCommands(Nameall sQA:Longanswers,withShortanswers(e.g.Exactanswers(namedAnswerExtraction:cutandpasteofsnippetsfromthe Generation:frommultiplesentencesorQAandsummarization(e.g.WhatisthisstoryQAResearchQuestion Free Fixedset

GrowinginterestinQA(TREC,CLEF,NT QAResearchasas

answersmustbefaithfulw.r.t.questions(correctness)andcompact(exactness)asfaithfulasasfaithfulasQuestionAnsweringatTheproblemsimplified:TheTextRetrievalConferenceSinceEncourageresearchininformationretrievalbasedonlarge-scalecollectionsNIST:NationalInstituteofStandardsandARDA:AdvancedResearchandDevelopmentDARPA:DefenseAdvancedResearchProjectsParticipantsareresearchinstitutes,universities,HistoryofQAatTREC8(Voorhees,1999)(FirstQA200fact-basedshort-answerQuestionsmainlybackformulated Answerscouldbe50-byteor250-bytes5answerscouldbereturnedforeachBestsystemscouldanswerover2/3ofthequestions(Moldovanetal.,1999;SrihariandLi,1999).TREC10(Voorhees,2001)Listquestionssuchas“Name20countriesthatproduceQuestionswhichdon’thaveananswerintheHistoryofQAatTREC11(Voorhees,AnswershadtobeOnlyoneanswercouldbereturnedperTREC12(Voorhees,2003)IntroduceddefinitionDefineatargetsuchas“aspirin(阿斯匹林)“AaronAdefinitionshouldcontainanumberofimportantfacts(vitalnuggets) soincludeotherassociatedinformation(non-vitalnuggets)Evaluatedusingalengthbasedprecisionmetricpenalizeslonganswerscontainingfew HistoryofQAatTREC13(Voorhees,2004)combinesthethreequestiontypesintoascenariosaroundtargets.ForTarget:HaleBoppComet彗星Factoid:WhenwasthecometFactoid:HowoftendoesitapproachtheList:Inwhatcountrieswasthecometvisibleonit’slastreturn? lmeanythingelsenotcoveredbytheaboveHistoryofQAatTREC14Questionswerebasedaround7519191918Theseriesoftargetscontainedatotal362factoid93list75(onepertarget)otherAllanswershadtobewithreferencetoaintheAQUAINTcollectionofnewswireThe Thecollectionusesnewsarticlesfromthefollowingsources:APnewswire,1998-NewYorkTimesnewswire,1998-XinhuaNewsAgencynewswire,1996-Intotalthereare1,033,461 sinthecollection.3GBoftextTRECQ-1391:Howmanyfeetin Q-1057:WhereisthevolcanoMaunaLoa?Q-1071:Whenwasthefirststampissued?Q-1079:WhoisthePrimeMinisterofCanada?Q-1268:Nameafoodhighinzinc.Q-896:WhowasGalileo?Q-897:Whatisanatom?Q-711:WhattouristattractionsarethereinQ-712:WhatdomosttouristsvisitinReims?Q-713:WhatattractstouristsinReimsQ-714:Whataretouristattractionsin

shortanswerQuestionsatYes/Yes/IsWhatisWhocapitaloflargestinGermanyGalileoName9thatimportCubansugarWhataretheforandprayerinschool?AnswerCriteriaforjudginganRelevance相关性)itshouldberesponsivetotheConciseness(简明性)itshouldnotcontainextraneousorirrelevantinformationCompleteness(完整性)itshouldbecomplete,i.e.partialanswershouldnotgetfullcreditSimplicity(朴素性)itshouldbesimple,sothatthequestionercanreaditeasilyJustification(有理有据):itshouldbe dwithsufficientcontexttoallowareadertodeterminewhythiswaschosenasananswertothequestionExactBasicunitofaresponse:[answer-string,docid]Ananswerstringmustcontainacomplete,exactanswerandnothingelse.WhatisthelongestriverintheUnitedStates?Thefollowingarecorrect,exactanswers thetheMississippiMississippiRiver;whilenoneofthefollowingarecorrectexactAt2,348 theMississippiRiveristhelongestriverintheUS. ; Fourpossiblejudgmentsfora[ ,answerRight:theanswerisappropriatefortheInexact:usedfornoncompleteUnsupported:answerswithoutWrong:theanswerisnotappropriatefortheWhatisthecapitalcityofNewZealand?WhatistheBostonStrangler'sname?Whatistheworld'ssecondlargestisland?WhatyeardidWiltChamberlainscore100Whois ernorof

R1530 .0298R1490 .0267AlbertR1503 .0249NewU1402 .0283R1426 .0149What'sthenameofKingArthur'sU1506.0245WhendidEinsteinR1601.0374April18,WhatwasthenameoftheplanethatX1848.0143theAtomicBombonWhatwasthenameofFDR'sR1838.0164WhatdaydidNeilArmstronglandontheR1674.0042July20,WhowasthefirstTripleCrownX1716.0423WhenwasLyndonB.JohnsonR1473.0055WhowasWoodrowWilson'sFirstR1622.0086WhereisAnneFrank'sW1510.0338YoungR=Right,X=ineXact,U=Unsupported,1848:WhatwasthenameoftheplanethatdroppedtheAtomicBombonHiroshima?DIOGENE:PARAGRAPH:NYT ASSESMENT:INEXACTTibbetspilotedtheBoeingB-29SuperfortressEnolawhichdroppedtheatomicbombonHiroshimaonAug.6,1945,causinganestimated66,000to240,000deaths.Henamedtheplaneafterhismother,EnolaGayTibbets.1402:WhatyeardidWiltChamberlainscore100points?DIOGENE:1962PARAGRAPH: Petty's200victories,172ofwhichcameduringa13-yearspanbetween1962-75,maybeasunapproachableasJoeDiMaggio's56-gamehittingstreakorWiltChamberlain's100-pointgame.1510:WhereisAnneFrank'sDIOGENE:YoungGirlPARAGRAPH:NYT OttoFrankreleasedaheavilyeditedversionof“B”foritsfirstpublicationas“AnneFrank:DiaryofaYoungGirl”in1947.TRECEvaluationTRECEvaluationMetric:MeanReciprocalRankReciprocalRank=inverseofrankatwhichfirstcorrectanswerwasfound:[1,0,5,0.33,0.25,0.2,0] MRR:averageoverallTRECEvaluationTRECEvaluationMetric:Confidence-WeightedScoreSumfori=1to500(#-correct-up-to-questioni/System

(1/1)+((1+0)/2)+(1+0+1)/3)+((1+0+1+1)/4)+((1+0+1+1+0)/5)11120314150System1 2 3 4 5

0+((0+0)/2)+(0+0+1)/3)+((0+0+1+1)/4)+5Total: Best Averageover67runs:

TopPerformingCurrentlythebestperformingsystemsatTRECcananswerapproxima y60-80%ofthequestionsAprettyamazingApproachesandsuccesseshavevariedafairPattern-based:Middlegroundistousealargecollectionofsurfacematchingpatterns(ISI)Knowledge-Based:Knowledge-richapproaches,usingavastarrayofNLPtechniquesstoletheshowin2000,2001NotablyHarabagiu,Moldovanetal.–Web-based:AskMSRsystemstressedhowmuchcouldbeachievedbyverysimplemethodswithenoughtext(nowhasvariouscopycats)Pattern-BasedISISearchforpredefinedpatternsoftextualexpressionsthatmaybeinterpretedasanswerstocertainquestiontypes.ThepresenceofsuchpatternsinanswerstringcandidatesmayprovideevidenceoftherightKnowledgeDetailedcategorizationofquestionUpto9typesofthe“Who”question;35categoriesintotalSignificantnumberofpatternscorrespondingtoeachquestiontypeUpto23patternsforthe“Who-Author”type,averageof15Findmultiplecandidatesnippetsandcheckforthepresenceofpatterns(emphasisonrecall)QATypologyfrom((AGENT(行为主体(NAME(FEMALE-FIRST-NAME(EVEMARY...))JESUSROMANOFF...)(ANIMAL-HUMAN(ANIMAL(WOODCHUCKYAK)(ORGANIZATION(SQUADRONDICTATORSHIP...))(GROUP-OF-PEOPLE(POSSECHOIR...))(STATE-DISTRICT(TIROLMISSISSIPPI...))(CITY(ULAN-BATORVIENNA...))(COUNTRY(SULTANATEZIMBABWE...))))(STATE-DISTRICT(CITYCOUNTRY...))AIRPORTCOLLEGECAPITOL...)((LANGUAGE(LETTER-CHARACTER(AB...))) TYINFORMATION- TYMONETARY- TYENERGY- TY

TYAREA- TYDISTANCE-TY))...PERCENTAGE)))((INFORMATION-UNIT(BITBYTE...EXABYTE))(MASS-UNIT(OUNCE...))(ENERGY-UNIT(BTU...))(CURRENCY-UNIT(ZLOTYPESO...))(TEMPORAL-UNIT(ATTOSECOND...MILLENIUM))(ILLUMINATION-UNIT(LUXCANDELA))(SPATIAL-(AREA-UNIT(ACRE))...PERCENT))((FOOD(HUMAN-FOOD(FISHCHEESE...)))((LIQUID(LEMONADEGASOLINEBLOOD...))(SOLID-SUBSTANCE(MARBLEPAPER...))(GAS-FORM-SUBSTANCE(GASAIR))...))(INSTRUMENT(DRUMDRILL(WEAPON(ARMGUN))...)(BODY-PART(ARMHEART...))(MUSICAL-INSTRUMENT...*GARMENT*PLANTQATypologyfrom(NAME(FEMALE-FIRST-NAME(EVEMARY...))(MALE-FIRST-NAME(LAWRENCESAM...))))JESUSROMANOFF...)(ANIMAL-HUMAN(ANIMAL(WOODCHUCKYAK)(GROUP-OF-PEOPLE(POSSECHOIR...))(CITY(ULAN-BATORVIENNA...))(COUNTRY(SULTANATEZIMBABWEExample:patternsfordefinitionQuestion:Whatis<A;[comma];or;X;<A;dash;X;<A;comma;[also]called;X<A;iscalled;

...23correct…26correct…12correct…9correct…8correct…7correct…3correcttotal:88correctUseofanswerForgeneratingqueriestothesearchHowdidMahatmaGandhiMahatmaGandhidie<HOW>MahatmaGandhidieof<HOW>MahatmaGandhilosthislifeinTheTEXTMAPsystem(ISI)uses550patterns,groupedin105equivalenceblocks.OnTREC-2003questions,thesystemproduced,o age,5reformulationsforeachquestion.ForanswerWhenwasMozart

>(<BIRTHDATE>- >wasbornonAcquisitionofAnswerRelevantManuallydevelopedsurfacepatternlibrary(Soubbotin,Soubbotin,Automaticallyextractedsurfacepatterns(Ravichandran,HovyPatterStartwithaseed,e.g.(Mozart,Download susingasearchRetainsentencesthatcontainbothquestionandanswer uffixtreeforextractingthelongestmatchingsubstringthatspans<Question>and<Answer>CalculateprecisionofPrecision=#ofcorrectpatternswithcorrectanswer/#oftotalpatternsPattern"Whenwas >Typical"Mozartwasbornin"Gandhi(1869-Suggestsphrases(regularexpressions)"<NAME>wasbornin“ThegreatcomposerMozart(1756-1791)achievedfameatayoungage”“Mozart(1756-1791)wasa“ThewholeworldwouldalwaysbeindebtedtothegreatmusicofMozart(1756-1791)”Longestmatchingsubstringforall3sentencesis(1756-PatternLearningRepeatwithdifferentexamplesofsamequestiontype“Gandhi1869”,“Newton1642”,Somepatternslearnedfora.bornin<ANSWER>,b.<NAME>wasbornon<ANSWER>c.<NAME>(<ANSWER>d.<NAME>(<ANSWER>-6differentQfromWebclopediaQATypologyyetal.,2002a)Experiments:patternBIRTHDATE•<NAME>(<ANSWER>-•<NAME>wasbornon•<NAME>wasbornin•<NAME>wasborn•<ANSWER><NAME>was•-<NAME>(•<NAME>(<ANSWER>•<ANSWER>invents•the<NAME>wasinventedby•Experiments•when<ANSWER>discovered•<ANSWER>'sdiscoveryof•<NAME>wasdiscoveredby<ANSWER>•<NAME>andrelated•formof<ANSWER>,•as<NAME>,<ANSWER>WHY-•<ANSWER><NAME>•laureate<ANSWER>•<NAME>isthe<ANSWER>•<ANSWER>'s•regional:<ANSWER>:•near<NAME>inDependingonquestiontype,gethighMRR(0.6–0.9),withhigherresultsfromuseofWebthanTRECQAcollectionings&Longdistance"Whereis"London,whichhasoneofthemostbusiestairportsintheworld,liesonthebanksoftheriverThames”wouldrequirepattern<QUESTION>,(<any_word>)*,liesonAbundance&varietyofWebdatahelpssystemtofindaninstanceofpatternsw/olosinganswerstolongdistanceCapturingvariabilitywithPatternbasedQAismoreeffectivewhensupportedvariable obtainedusingNLPtechniquesandWhenwas<A> >(<ANSWER:DATE><A >wasbornin<ANSWER:DATESurfacepatternscannotdealwithwordreorderingandappositionphrases:Galileo,thefamousastronomer,wasborninThefactthatmostoftheQAsystemsusesyntacticparsingdemonstratesthatthesuccessfulsolutionoftheanswerextractionproblemgoesbeyondthesurfaceform SyntacticanswerAnswerpatternsthatcapturethesyntacticrelationsofasentence.Whenwas<A>S was SyntacticanswerThematchingphaseturnsouttobeaproblemofpartialmatchamongsyntactictrees.SSwasKnowledge-BasedKnowledge-BasedLinguistic-orientedDeterminetheanswertypefromquestionRetrievesmallportions FindentitiesmatchingtheanswertypecategoryintextMajorityofsystemsusealexicon(usuallyQuestionProcessing:Tofindanswerofthecorrecttype

Toverify andidateanswerAnswer TogetComplexKnowledgebased

POSTAGGINGWORDSENSEANSWERTYPEQuestion

SEARCH

AnswerExtraction Input:NLPqueryforthesearchengine(i.e.acompositionofweighted AnswerAdditionalconstraints:questionfocus,syntacticorsemanticrelationsthatshouldholdforacandidateanswerentityandother ysis--POS-Multi-wordsAnswertypeandfocusKeywordWordSenseTokenizationandPOS-NL-QUESTION:Whowastheinventoroftheelectric??NL-QUESTION:Whowastheinventoroftheelectric??SyntacticIdentifysyntacticstructureofanounphrases(NP),verbphrases(VP),prepositionalphrases(PP)etc.WhydidDavidKoreshasktheFBIforaword

AnswerTypeandFocusisthewordthatexpressestherelevantentityinthequestionUsedto etof ES:WherewasMozartAnswerTypeisthecategoryoftheentitytobesearchedasanswer ,MEASURE,TIMEPERIOD,DATE,ORGANIZATION,DEFINITIONES:WherewasMozartAnswerTypeandWhatfamouscommunistleaderdiedinMexicoRULENAME:WHAT-WHOTEST:[“what”[¬NOUN]*OUTPUT:[“ ”J]

-p]JAnswertype:Focus:Thisrulematchesanyquestionstartingwithwhat,whosefirstnoun,ifany,isa (i.e.satisfiesthe -ppredicate)NL-QUESTION:Whowastheinventoroftheelectric??WordSenseWhatisthebrighteststarvisiblefrom star#1:celestialbodystar#2:anactorwhoplay…

bright#1:brightbrilliantshiningbright#2:populargloriousbright#3:promisingauspicious

visible#1:conspicuousobviousvisible#2:visibleseeable

earth#1:Earthworldearth#2:esta andlanded_estateacresearth#3:clayearth#4:dry_landearthsolid_groundearth#5:landgroundsoilearth#6:earth

NL-

Whowastheinventoroftheelectriclight?

discoverer,

Keyword andexpansionsarecomposedinabooleanexpressionwithAND/ORoperatorsSeveralANDCartesian(OR(OR(inventorANDOR(inventorANDincandescent_lamp)OR(discovererANDelectric_light)ORinventorORCollectionPre-ForrealtimeQAapplicationsoff-linepre-processingofthetextisnecessaryTermPOS-NamedEntitiesCandidate PassageSelection:Individuaterelevant,small,textportionsGiven andalist Paragraphlength(e.g.200Considerthepercentageof presentinthepassageConsiderifsomekeywordisobligatory(e.g.thefocusofthequestion).Candidate PassagetextNamedEntityWhoistheauthorofthe“StarSpangled >FrancisScottKey >wrote“StarSpangledBanner”inSomepassagesparsing(Harabagiu,Logicalform(Zajac,AnswerWhoistheauthorofthe“StarSpangled >FrancisScottKey</ >wrotethe“StarSpangledBanner”in<DATE>1814</DATE>AnswerTypeCandidateAnswer=FrancisScottRankingcandidateanswers:keyworddensityinthepassage,applyadditionalconstraints(e.g.syntax,semantics),rankcandidatesusingtheWeb小结:Knowledgebased POSTAGGINGWORDSENSEANSWERTYPEQuestion

SEARCH

AnswerExtraction LCCBlockExtractsandrankspassagesusingExtractsandrankspassagesusingsurface-texttechniquesCapturesthesemanticsofthe for

ExtractsandranksanswersusingNLtechniquesQuestionQuestionQuestionRecognitionofKeyword

AnswerProcessingAnswerAATheoremAnswerAnswerNER:NamedEntity Web-basedWeb-BasedTRECWebQuestionAnswering:IsMoreAlwaysDumais,Banko,Brill,Lin,Ng ,MIT,Q:“WhereistheLouvreWant“Paris”or“France”or“75058ParisCedex01”oramapDon’tjustwantURLsAskMSR:1212354Step1:RewriteIntuition:Theuser’squestionisoftensyntacticallyquiteclosetosentencesthatcontaintheanswerWhereWhereistheLouvreMuseumTheLouvreMuseumislocatedinWhocreatedthecharacterofCharlesDickenscreatedthecharacterofQueryClassifythequestioninto1of7categories,ea appedtoasetofrewrites.Rewritesetsrangeinsizefrom1to5Theoutputofarewritemoduleisa3-tuple[string,L/R/-,Noparserorpart-of-speechtaggerisusedforqueryButalexiconisusedtodeterminepart-of-speech&morphologicalvariantsofaword.Therewritesaresimplestring-basedAfinalrewrite,whichisaback-offtoasimpleANDingofthenonstopwords,iscreated.QueryCategoriescouldbesomethingWhoWhenis/did/will/are/wereWhereis/are/wereCategory-specifictransformatione.g.“ForWherequestions,move‘is’toallpossible“WhereistheLouvreMuseum “istheLouvreMuseum “theisLouvreMuseum “theLouvreisMuseum “theLouvreMuseumis “theLouvreMuseumlocated

butwhocares?It’sonlyafew QueryExpectedanswer(e.g., ,Location,WhenwastheFrenchRevolution? (Couldtheybeautomaticallylearned?)QueryRewriting-SomequeryrewritesaremorereliablethanWhereistheLouvreMuseumWeightWeightLotsofcouldcomebacktoo

ifwegetait’sprobably+“theLouvreMuseumis+Louvre+MuseumStep2:QuerysearchThrowallrewritestoaWeb-widesearchRetrievetopNanswersForspeed,relyjustonsearchengine’s“snippets”,notthefulltextoftheactual.Truncationmightoccur,sincethesummarycontainsthequerytermswithafewsurroundingStep3:MiningN-Unigram,bigram,trigram,…N-gram:listofNadjacenttermsinasequenceE.g.,“WebQuestionAnswering:IsMoreAlwaysUnigrams:Web,Question,Answering,Is,More,Always,Bigrams:WebQuestion,QuestionAnswering,AnsweringIs,IsMore,MoreAlways,AlwaysBetterTrigrams:WebQuestionAnswering,QuestionAnsweringIs,AnsweringIsMore,IsMoreAlways,MoreAlways MiningN-EnumerateallN-grams(N=1,2,3)inallretrievedAnN-gramisscoredaccordingtotheweightofthequeryrewritethatretrievedit.ThescoresaresummedacrossthesummariesthatcontaintheN-gram(oppositeoftheusualidfrankingFrequencyofoccurrencewithinthesummaryisn’tcounted(thetfcomponentinrankingschemes).Example:“WhocreatedthecharacterofDickens–CharlesDickens–CarlBanks–Uncle-

ChristmasCarol-78Disney-72AChristmas-Step4:FilteringN-Thequeryis yzed&assignedtooneofsevenquestionstypes: etc…Eachquestiontypeisassociatedwithoneormore“data-typefilters”=regularexpressionWhatWho

Step4:FilteringN-Acollectionofabout15handwrittenfilterswascreatedbasedonhumanknowledgeThedeterminedfiltersareappliedtoeachcandidatestringinorderto:BoostscoreofN-gramsthatdomatch“regularLowerscoreofN-gramsthatdon’tmatch“regularStep5:Tilingthe MrMr

oldn-MrCharlesMrCharlesN-

tilehighest-scoringn-

N-Repeat,untilnomore StandardTREC-9contesttest-~1 s;900Systemsreceivequeryandgenerate“top5candidateanswers”Standardperformancemetric:MRR(MeanReciprocalRank)Score=1/R,whereRisrankofthecorrect1:2:3:4:5:6+:

1# ResultsMRR=0.507(rightanswerrankedaboutAbout61%ofthequestionsareTheaverageanswerlengthis12Thesystemreturnsshortanswers,andnotInformationHelpfulWorthlessIsmorealwaysWhatistheinfluenceofthenumberofsnippetsthesearchenginereturnsonthequalityoftheanswers?Performanceimprovessharplyasthenumberofsnippetsincreasesto50.Increasesslowlyafterthat(peakingat200snippets) DescendsasmoresnippetsareincludedforN-gramAnswerTheproblem:AnswerGivenaquestionqandacand

温馨提示

  • 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
  • 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
  • 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
  • 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
  • 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
  • 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
  • 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

评论

0/150

提交评论