搜索引擎外文文献翻译中英文_第1页
搜索引擎外文文献翻译中英文_第2页
搜索引擎外文文献翻译中英文_第3页
搜索引擎外文文献翻译中英文_第4页
搜索引擎外文文献翻译中英文_第5页
已阅读5页,还剩11页未读 继续免费阅读

下载本文档

版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领

文档简介

外文文献翻译(含:英文原文及中文译文)英文原文InvestigatingtheQueryingandBrowsingBehaviorofAdvancedSearchEngineUsersWhite,Ryen,MorrisDanABSTRACTOnewaytohelpallusersofcommercialWebsearchenginesbemoresuccessfulintheirsearchesistobetterunderstandwhatthoseuserswithgreatersearchexpertisearedoing,andusethisknowledgetobenefiteveryone.Inthispaperwestudytheinteractionlogsofadvancedsearchengineusers(andthosenotsoadvanced)tobetterunderstandhowtheseusergroupssearch.Theresultsshowthattherearemarkeddifferencesinthequeries,resultclicks,post-querybrowsing,andsearchsuccessofusersweclassifyasadvanced(basedontheiruseofqueryoperators),relativetothoseclassifiedasnon-advanced.Ourfindingshaveimplicationsforhowadvancedusersshouldbesupportedduringtheirsearches,andhowtheirinteractionscouldbeusedtohelpsearchersofallexperiencelevelsfindmorerelevantinformationandlearnimprovedsearchingstrategies.Keywords:Querysyntax,advancedsearchfeatures,expertsearching.INTRODUCTIONTheformulationofquerystatementsthatcaptureboththesalientaspectsofinformationneedsandaremeaningfultoInformationRetrieval(IR)systemsposesachallengeformanysearchers[3].CommercialWebsearchenginessuchasGoogle,Yahoo!,andWindowsLiveSearchofferuserstheabilitytoimprovethequalityoftheirqueriesusingqueryoperatorssuchasquotationmarks,plusandminussigns,andmodifiersthatrestrictthesearchtoaparticularsiteortypeoffile.Thesetechniquescanbeusefulinimprovingresultprecisionyet,otherthanvialoganalyses(e.g.,[15][27]),theyhavegenerallybeenoverlookedbytheresearchcommunityinattemptstoimprovethequalityofsearchresults.IRresearchhasgenerallyfocusedonalternativewaysforuserstospecifytheirneedsratherthanincreasingtheuptakeofadvancedsyntax.Researchonpracticaltechniquestosupplementexistingsearchtechnologyandsupportusershasbeenintensifyinginrecentyears(e.g.[18][34]).However,itischallengingtoimplementsuchtechniquesatlargescalewithtolerablelatencies.TypicalqueriessubmittedtoWebsearchenginestaketheformofaseriesoftokensseparatedbyspaces.ThereisgenerallyanimpliedBooleanANDoperatorbetweentokensthatrestrictssearchresultstodocumentscontainingallqueryterms.DeLimaandPedersen[7]investigatedtheeffectofparsing,phraserecognition,andexpansiononWebsearchqueries.TheyshowedthattheautomaticrecognitionofphrasesinqueriescanimproveresultprecisioninWebsearch.However,thevalueofadvancedsyntaxfortypicalsearchershasgenerallybeenlimited,sincemostusersdonotknowaboutadvancedsyntaxordonotunderstandhowtouseit[15].Sinceitappearsoperatorscanhelpretrieverelevantdocuments,furtherinvestigationoftheiruseiswarranted.Inthispaperweexploretheuseofqueryoperatorsinmoredetailandproposealternativeapplicationsthatdonotrequirealluserstouseadvancedsyntaxexplicitly.Wehypothesizethatsearcherswhouseadvancedquerysyntaxdemonstrateadegreeofsearchexpertisethatthemajorityoftheuserpopulationdoesnot;anassertionsupportedbypreviousresearch[13].Studyingthebehavioroftheseadvancedsearchengineusersmayyieldimportantinsightsaboutsearchingandresultbrowsingfromwhichothersmaybenefit.Throughanexperimentalstudyandanalysis,weofferpotentialanswersforeachofthesequestions.Arelationshipbetweentheuseofadvancedsyntaxandanyofthesefeaturescouldsupportthedesignofsystemstailoredtoadvancedsearchengineusers,oruseadvancedusers’interactionstohelpnon-advancedusersbemoresuccessfulintheirsearches.RELATEDWORKFactorssuchaslackofdomainknowledge,poorunderstandingofthedocumentcollectionbeingsearched,andapoorlydevelopedinformationneedcanallinfluencethequalityofthequeriesthatuserssubmittoIRsystems([24],[28]).Therehasbeenavarietyofresearchintodifferentwaysofhelpingusersspecifytheirinformationneedsmoreeffectively.Belkinetal.[4]experimentedwithprovidingadditionalspaceforuserstotypeamoreverbosedescriptionoftheirinformationneeds.AsimilarapproachwasattemptedbyKellyetal.[18],whousedclarificationformstoelicitadditionalinformationaboutthesearchcontextfromusers.Theseapproacheshavebeenshowntobeeffectiveinbest-matchretrievalsystemswherelongerqueriesgenerallyleadtomorerelevantsearchresults[4].However,inWebsearch,wheremanyofthesystemsarebasedonanextendedBooleanretrievalmodel,longerqueriesmayactuallyhurtretrievalperformance,leadingtoasmallnumberofpotentiallyirrelevantresultsbeingretrieved.Itisnotsimplysufficienttorequestmoreinformationfromusers;thisinformationmustbeofbetterquality.RelevanceFeedback(RF)andinteractivequeryexpansionarepopulartechniquesthathavebeenusedtoimprovethequalityofinformationthatusersprovidetoIRsystemsregardingtheirinformationneeds.InthecaseofRF,theuserpresentsthesystemwithexamplesofrelevantinformationthatarethenusedtoformulateanimprovedqueryorretrieveanewsetofdocuments.IthasprovendifficulttogetuserstouseRFintheWebdomainduetodifficultyinconveyingthemeaningandthebenefitofRFtotypicalusers.Querysuggestionsofferedbasedonquerylogshavethepotentialtoimproveretrievalperformancewithlimiteduserburden.Thisapproachislimitedtore-executingpopularqueries,andsearchersoftenignorethesuggestionspresentedtothem.Inaddition,bothofthesetechniquesdonothelpuserslearntoproducemoreeffectivequeries.Log-basedanalysisofusers’interactionswiththeExciteandAltaVistasearchengineshasshownthatonly10-20%ofqueriescontainedanyadvancedsyntax.ThisanalysiscanbeausefulwayofcapturingcharacteristicsofusersinteractingwithIRsystems.Researchinusermodelingandpersonalizationhasshownthatgatheringmoreinformationaboutuserscanimprovetheeffectivenessofsearches,butrequiremoreinformationaboutusersthanistypicallyavailablefrominteractionlogsalone.Unlesscoupledwithaqualitativetechnique,suchasapost-sessionquestionnaire[23],itcanbedifficulttoassociateinteractionswithusercharacteristics.Inourstudyweconjecturethatgiventhedifficultyinlocatingadvancedsearchfeatureswithinthetypicalsearchinterface,andthepotentialproblemsinunderstandingthesyntax,thoseusersthatdouseadvancedsyntaxregularlyrepresentadistinctclassofsearcherswhowillexhibitothercommonsearchbehaviors.Inthispaperwestudyothersearchcharacteristicsofusersofadvancedsyntaxinanattempttodeterminewhetherthereisanythingdifferentabouthowthesesearchengineuserssearch,andwhethertheirsearchescanbeusedtobenefitthosewhodonotmakeuseoftheadvancedfeaturesofsearchengines.Todothisweuseinteractionlogsgatheredfromlargesetofconsentingusersoveraprolongedperiod.Inthenextsectionwedescribethedataweusetostudythebehavioroftheuserswhouseadvancedsyntax,relativetothosethatdonotusethissyntax.DATAToperformthisstudywerequiredadescriptionofthequeryingandbrowsingbehaviorofmanysearchers,preferablyoveraperiodoftimetoallowpatternsinuserbehaviortobeanalyzed.ToobtainthesedataweminedtheinteractionlogsofconsentingWebusersoveraperiodof13weeks,fromJanuarytoApril2006.Whendownloadingapartnerclient-sideapplication,theuserswereinvitedtoconsenttotheirinteractionwithWebpagesbeinganonymouslyrecorded(withauniqueidentifierassignedtoeachuser)andusedtoimprovetheperformanceoffuturesystems.Theinformationcontainedintheselogentriesincludedauniqueidentifierfortheuser,atimestampforeachpageview,auniquebrowserwindowidentifier(toresolveambiguitiesindeterminingwhichbrowserapagewasviewed),andtheURLoftheWebpagevisited.Thisprovideduswithsufficientdataonqueryingbehavior(frominteractionwithsearchengines),andbrowsingbehavior(frominteractionwiththepagesthatfollowasearch)tomorebroadlyinvestigatesearchbehavior.Inadditiontothedatagatheredduringthecourseofthisstudywealsohadrelevancejudgmentsofdocumentsthatusersexaminedfor10,680uniquequerystatementspresentintheinteractionlogs.Thesejudgmentswereassignedonasix-pointscalebytrainedhumanjudgesatthetimethedatawerecollected.Weusethesejudgmentsinthisanalysistoassesstherelevanceofsitesusersvisitedontheirbrowsetrailawayfromsearchresultpages.Theprivacyofourvolunteerswasmaintainedthroughouttheentirecourseofthestudy:nopersonalinformationwaselicitedaboutthem,participantswereassignedauniqueanonymousidentifierthatcouldnotbetracedbacktothem,andwemadenoattempttoidentifyaparticularuserorstudyindividualbehaviorinanyway.Allfindingswereaggregatedovermultipleusers,andnoinformationotherthanconsentforloggingwaselicited.DISCUSSIONANDIMPLICATIONSOurfindingsindicatesignificantdifferencesinthequerying,result-click,post-querynavigation,andsearchsuccessofthosethatuseadvancedsyntaxversusthosethatdonot.Manyofthesefindingsmirrorthosealreadyfoundinpreviousstudieswithgroupsofself-identifiednovicesandexperts.Thereareseveralwaysinwhichacommercialsearchenginesystemmightbenefitfromaquantitativeindicationofsearcherexpertise.Thismightbeyetanotherfeatureavailabletoarankingengine;i.e.itmaybethecasethatexpertsearchersinsomecasespreferdifferentpagesthannovicesearchers.Theuserinterfacetoasearchenginemightbetailoredtoauser’sexpertiselevel;perhapsevenmoreadvancedfeaturessuchastermweightingandqueryexpansionsuggestionscouldbepresentedtomoreexperiencedsearcherswhilepreservingthesimplicityofthebasicinterfacefornovices.Resultpresentationmightalsobecustomizedbasedonsearchskilllevel;futureworkmightre-evaluatethebenefitsofcontentsnippets,thumbnails,etc.inamannerthatallowsdifferentoutcomesfordifferentexpertiselevels.Additionally,ifbrowsinghistoriesareavailable,thedestinationsofadvancedsearcherscouldbeusedassuggestedresultsforqueries,bypassingandpotentiallyimprovinguponthetraditionalsearchprocess.Theuseoftheinteractionofadvancedsearchengineuserstoguideotherswithlessexpertiseisanattractivepropositionforthedesignersofsearchsystems.Inpart,thesesearchersmayhavemorepost-querybrowsingexpertisethatallowsthemtoovercometheshortcomingsofsearchsystems.Theirinteractionscanbeusedtopointuserstoplacesthatadvancedsearchengineusersvisitorsimplytotrainlessexperiencedsearchershowtosearchmoreeffectively.However,ifexpertusersaregoingtobeusedinthisway,issuesofdatasparsitywillneedtobeovercome.Ouradvancedusersonlyaccountedfor20.1%oftheuserswhoseinteractionswestudied.Whilstthesemaybeamongstthemostactiveusersitisunlikelythattheywillviewdocumentsthatcoverlargenumberofsubjectareas.However,ratherthanfocusingonwheretheygo(whichisperhapsmoreappropriateforthosewithdomainknowledge),advancedsearchengineusersmayusemoves,tacticsandstrategies[2]thatinexperienceduserscanlearnfrom.Encouraginguserstouseadvancedsyntaxhelpsthemlearnhowtoformulatebettersearchqueries;leveragingthesearchingstyleofexpertsearcherscouldhelpthemlearnmoresuccessfulpost-queryinteractions.Onepotentiallimitationtotheresultswereportisthatinpriorresearch,ithasbeenshownthatqueryoperatorsdonotsignificantlyimprovetheeffectivenessofWebsearchresults[8],andthatsearchersmaybeabletoperformjustaswellwithoutthem[27].Itcouldthereforebearguedthattheuserswhodonotusequeryoperatorsareinfactmoreadvanced,sincetheydonotwastetimeusingpotentiallyredundantsyntaxintheirquerystatements.However,thisseemsunlikelygiventhatthosewhouseadvancedsyntaxexhibitedsearchbehaviorstypicalofuserswithexpertise[13],andaremoresuccessfulintheirsearching.However,infutureworkwewillexpandofdefinitionof“advanceduser”beyondattributesofthequerytoalsoincludeotherinteractionbehaviors,someofwhichwehavedefinedinthisstudy,andotheravenuesofresearchsuchaseye-tracking[12].中文译文高级搜索引擎用户的查询和浏览行为怀特,瑞恩,莫里斯,丹摘要帮助商业网络搜索引擎的所有用户在搜索中取得更大成功的一种方法是更好地了解具有更高搜索专业知识的用户在做什么,并利用这些知识为每个人带来收益。在本文中,我们研究高级搜索引擎用户(以及那些不那么先进的)的交互日志,以更好地了解这些用户组搜索的方式。结果显示,与分类为非高级的用户相比,查询,结果点击,查询后浏览以及我们分类为高级(基于查询运算符的使用)的用户的搜索成功率存在显着差异。我们的研究结果意味着在搜索过程中应该如何支持高级用户,以及他们的互动如何用于帮助所有经验级别的搜索者找到更多相关信息并学习改进的搜索策略。关键字:查询语法,高级搜索功能,专家搜索。引言查询语句的制定既捕获了信息需求的突出方面,又对信息检索(IR)系统有意义,这对许多搜索者提出了挑战。诸如Google,Yahoo!和WindowsLiveSearch等商业Web搜索引擎为用户提供了使用查询运算符(如引号,加号和减号)以及限制搜索到特定站点的修饰符或文件类型。除了通过日志分析,这些技术可以用于提高结果精度,但研究人员一般忽略这些技术来提高搜索结果的质量。IR研究一般侧重于用户指定需求的替代方式,而不是增加高级语法的使用。近年来,对补充现有搜索技术和支持用户的实用技术的研究一直在加剧(例如[18][34])。然而,以可忍受的延迟大规模实施这些技术是具有挑战性的。提交给Web搜索引擎的典型查询采用由空格分隔的一系列令牌的形式。在令牌之间通常存在隐含的布尔AND运算符,它将搜索结果限制为包含所有查询项的文档。DeLima和Pedersen[7]研究了解析,短语识别和扩展对Web搜索查询的影响。他们表明,在查询中自动识别短语可以提高网络搜索的结果精度。然而,对于典型的搜索者来说,高级语法的价值通常是有限的,因为大多数用户不知道高级语法或不知道如何使用它[15]。由于操作员可以帮助检索相关文件,因此需要对其使用进行进一步调查。在本文中,我们更详细地探讨了查询运算符的用法,并提出了不要求所有用户都明确使用高级语法的替代应用程序。我们假设使用高级查询语法的搜索者表现出大多数用户群体没有的搜索专业知识程度;一个断言支持以前的研究[13]。研究这些高级搜索引擎用户的行为可能会产生对其他人可能从中受益的搜索和结果浏览的重要见解。通过实验研究和分析,我们为每个问题提供可能的答案。高级语法的使用与任何这些功能之间的关系可以支持为高级搜索引擎用户量身定制的系统设计,或者使用高级用户的交互来帮助非高级用户在他们的搜索中更加成功。文献综述诸如缺乏领域知识,对正在搜索的文档集合理解不深以及信息需求不足等因素都会影响用户提交给IR系统的查询质量([24],[28])。已经有各种不同的方法来帮助用户更有效地指定他们的信息需求。Belkin等人[4]尝试为用户提供更多的空间来输入他们的信息需求的更详细的描述。Kelly等人尝试了类似的方法。[18],他使用澄清表格来从用户中获得关于搜索上下文的附加信息。已经证明这些方法在最佳匹配检索系统中是有效的,其中较长的查询通常导致更相关的搜索结果[4]。然而,在网络搜索中,许多系统基于扩展布尔检索模型,较长的查询实际上可能会损害检索性能,导致检索到少量可能不相关的结果。要求用户提供更多信息并不足够,这些信息必须具有更好的质量。相关性反馈(RF)和交互式查询扩展是常用的技术,用于提高用户向IR系统提供的有关其信息需求的信息的质量。在RF的情况下,用户向系统呈现相关信息的例子,然后用这些信息来制定改进的查询或检索新的文档集合。由于难以向典型用户传达RF的含义和好处,因此很难让用户在Web域中使用RF。基于查询日志提供的查询建议有可能在用户负担有限的情况下提高检索性能。这种方法仅限于重新执行流行的查询,而搜索者经常忽视向他们提出的建议。另外,这两种技术都不能帮助用户学习产生更有效的查询。用户与Excite和AltaVista搜索引擎交互的日志分析表明,只有10-20%的查询包含任何高级语法。该分析可以是捕获与IR系统交互的用户特征的有用方式。对用户建模和个性化的研究表明,收集更多关于用户的信息可以提高搜索的有效性,但需要更多关于用户的信息,而不仅仅是单独从交互日志中获得的信息。除非结合定性技术,如会后调查问卷[23],否则将交互与用户特征相关联可能很困难。在我们的研究中,我们猜想鉴于在典型的搜索界面中定位高级搜索功能存在困难,并且在理解语法方面存在潜在的问题,那些使用高级语法的用户通常会表现出一类独特的搜索者,他们将展示其他常见搜索行为。在本文中,我们将研究高级语法的用户的其他搜索特性,以试图确定这些搜索引擎用户搜索的方式是否有任何不同,以

温馨提示

  • 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
  • 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
  • 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
  • 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
  • 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
  • 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
  • 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

评论

0/150

提交评论