用于诊断辅助的医学数据挖掘的特点_第1页
用于诊断辅助的医学数据挖掘的特点_第2页
用于诊断辅助的医学数据挖掘的特点_第3页
用于诊断辅助的医学数据挖掘的特点_第4页
用于诊断辅助的医学数据挖掘的特点_第5页
已阅读5页,还剩76页未读 继续免费阅读

下载本文档

版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领

文档简介

SpecificsofMedicalDataMiningforDiagnosisAid:ASurvey

SarahItania,b,*,FabianLecronc,PhilippeFortempsc

aFundforScientificResearch-FNRS(F.R.S.-FNRS),Brussels,Belgium

bFacultyofEngineering,UniversityofMons,DepartmentofMathematicsandOperationsResearch,Mons,Belgium

cFacultyofEngineering,UniversityofMons,DepartmentofEngineeringInnovationManagement,Mons,Belgium

Abstract

Dataminingcontinuestoplayanimportantroleinmedicine;specifically,forthedevelopmentofdiagnosisaidmodelsusedinexpertandintelligentsystems.Althoughwecanfindabundantresearchonthistopic,cliniciansremainreluctanttousedecisionsupporttools.Socialpressureexplainspartlythislukewarmposition,butconcernsaboutreliabilityandcredibilityarealsoputforward.Toaddressthisreticence,weemphasizetheimportanceofthecollaborationbetweenbothdataminersandclinicians.Thissurveylaysthefoundationforsuchaninteraction,byfocusingonthespecificsofdiagnosisaid,andtherelateddatamodelinggoals.Onthisregard,weproposeanoverviewontherequirementsexpectedbytheclinicians,whoareboththeexpertsandthefinalusers.Indeed,webelievethattheinteractionwithcliniciansshouldtakeplacefromtheveryfirststepsoftheprocessandthroughoutthedevelopmentofthepredictivemodels,thusnotonlyatthefinalvalidationstage.Actually,againstacurrentresearchapproachquiteblindlydrivenbydata,weadvocatetheneedforanewexpert-awareapproach.Thissurveypaperprovidesguidelinestocontributetothedesignofdailyhelpfuldiagnosisaidsystems.

Keywords:DataMining;Medicine;DiagnosisAid;ExplainableArtificialIntelligence

1.Introduction

Asoneofthetrendiestresearchtopicsofourcentury,DataMining(DM)makeskeycontribu-tionstothescientificandtechnologicaladvanceinaconsiderablenumberoffields(

Gupta

,

2014

;

PhridviRajandGuruRao

,

2014

).Coinedduringthenineties,thedisciplineissubjecttoatoughcompetitionforthedevelopmentofalgorithmsalwaysmorepowerful,whichaimatprocessingdata

*Correspondingauthor.UniversityofMons,DepartmentofMathematicsandOperationsResearch,RuedeHoudain,9,7000Mons,Belgium.

Emailaddresses:sarah.itani@umons.ac.be(SarahItani),fabian.lecron@umons.ac.be(FabianLecron),philippe.fortemps@umons.ac.be(PhilippeFortemps)

2

Numberofpublications

1200

1000

800

600

400

200

0

199019952000200520102015

Year

Figure1:EvolutionoftheannualnumberofpublicationsrelatedtomedicaldataminingintheScopusdatabase(Sco

-

pus

)onaquarterofacentury,from1990to2015

toinfersomeknowledgeintheformofpatternsand/orrelationships(

BellazziandZupan

,

2008

).TheassociatedtechniquesarederivedfromthefieldsofbothstatisticsandMachineLearning(ML),thelatterwhichaimsatdevelopingcomputationalmethodsabletoextractgeneralizationsfromasetofdata(

Giudici

,

2005

).

MedicalapplicationsfeatureamongtheconcernsoftheDMcommunity,withasignificantin-creaseinresearchinterestoverthelastyears(seeFigure

1

).Thisinteractioncomesindifferentdisciplines(

Bellazzietal.

,

2011

):atthecellularandmolecularlevel(bioinformatics);atthetis-sueandorganlevel(imaginginformatics);atthesinglepatientlevel(clinicalinformatics);atthepopulationandsocietylevel(publichealthinformatics).

Forhalfacenturynow,diagnosispredictionhasbeenaveryactiveresearchareaofclinicalinformatics(

Wagholikaretal

.,

2012

).Inthisregard,withtheadventofDM,researchhasprogres-sivelyshiftedawayfromthestatisticalapproachlongconsideredasastandardpractice.Actually,underahypothetico-deductiveprocess,statisticalanalysesaredriventocheckahypothesisstatedbeforehandanddatasamplesarecollectedforthisspecialpurpose(

Yooetal.

,

2012

).Thisstatis-ticalapproachissurelyadaptedtoraisedifferencesbetweenpathologicalandcontrolgroups,butnottosetanindividualassessment,i.e.aclinicalexaminationpersubject.Incontrast,enrichedbyMLtechniques,DMinductivelyprocessesavoluminousamountofdata,tobothextractknowledgeanddeveloppredictivemodelsabletohelpindiagnosingpathologies(

Vieiraetal.

,

2017

;

Yooetal.

,

2012

;

BellazziandZupan

,

2008

).Insuchaprocess,statisticsmayfinditsplaceinfeatureengineer-

3

ing,beforethestageofmodelbuildingwhichismainlybasedonMLmethodsofclassificationorregression(

Esfandiarietal.

,

2014

).

Inthatrespect,itisthroughdataminingthatrecentworksweredevotedtotheearlydetectionofcancer,e.g.see

LyuandHaque

(

2018

);

Aliˇckovi´candSubasi

(

2017

);

Cichoszetal.

(

2016

);

Nahar

etal.

(

2016

);

Esfandiarietal.

(

2014

);

Krishnaiahetal.

(

2013

);

Parvinetal.

(

2013

);

Guptaetal

.(

2011

).Otherpathologies,suchascardiacandpulmonarydiseases,diabetes,hypertension,meningi-tisformbesidesasignificantpartoftheresearchformoreprecisediagnoses(

Esfandiarietal.

,

2014

).Severalpsychiatricdisorders,suchasAttentionDeficitHyperactivityDisorder(ADHD)(

Itanietal.

,

2018a

;

Abrahametal.

,

2017

;

Milhametal.

,

2012

),Alzheimer(

Papakostasetal

.,

2015

),autism(

Kos-

mickietal

.,

2015

),schizophrenia,depressionandParkinson(

Wooetal.

,

2017

)arealsotheobjectofextensiveinvestigation.

Asprobablyperceivedbymostofresearchers,andcertainlybytheauthorsofthepresentpaper,diagnosticdecisionsupportsystemsthathavebeenproposedsofararenotunanimouslyapprovedbyclinicians(

Wagholikaretal

.,

2012

).Suchsystems,andtheunderlyingpredictivemodels,arenotablyfoundasbeingfarfromthefieldreality.Itisthusmostlikelythatdataminersarenotenoughattentivetothespecificsofmedicaldiagnosticdecisionsupport.Inparticular,thoughtheDMcommunitywassensitizedaboutthedistinctivenatureofmedicalapplications(

CiosandMoore

,

2002

),thepredictiveperformanceremainspracticallythelonelyparameterwithinthescopeofdataminers,whichencouragescompetition.Thistrendhasbeenaccentuatedwiththegreateravailabilityofopenmedicaldatabases,sharedbydifferentmedicalandresearchcentersworldwide(

DiMartino

etal.

,

2017

;

Wooetal.

,

2017

;

DiMartinoetal.

,

2014

;

Esfandiarietal.

,

2014

;

Mennesetal.

,

2013

;

Ihleetal.

,

2012

;

Kerretal.

,

2012

;

Milhametal.

,

2012

;

Polineetal.

,

2012

).Someofthesedatasetswerelaunchedattheoccasionofofficialcontests,e.g.theADHD-200collection(

Milhametal.

,

2012

).Infocusingalmostexclusivelyonperformance,theseresearchworks(1)misschallengesofbetterperceivingandunderstandingtheissuespropertothemedicalfield,(2)areexposedtotheriskofyieldinginconsistentmodels,sincenotably,recentstudiesshowedthattheremaybenologicbehindthepredictionsofaccuratemodels(

Ribeiroetal.

,

2016

).

Itisourstrongconvictionthattheclinicianshavetobeinvolvedinthewholedevelopmentprocessoftheirdecisionsupportsystems.Indeed,theybringexpertiseandknowledgetocontributetointelligentandexpertsystems.Thatiswhy,inthepresentpaper,wewillshedlightuponthespecificsofmedicaldataminingfordiagnosisaidandraisetherelateddatamodelinggoals.Forsuchapurpose,wewilladdressthefollowingquestions.

4

(1)Howcandecisionsupportmodelsbemoreattractivetoclinicians?Whataretheexpressedrequirementsinthisregard?

(2)Whataretheobjectivescorrespondingtosuchrequirementsintermsofmathematicalmod-eling?

(3)Inwhatwaymedicaldata,particularlyinthiseraofopenmedicaldataproliferation,makesdataminingmorechallenging?

(4)Towhatextentarethecurrentdataminingtechniquesabletosatisfytheclinicians’needsandtohandletheparticularnatureofmedicaldatasimultaneously?

Inansweringthesequestions,weareledtodescribeacomprehensiveexpert-awareapproachwhichstandsoutfromtheexistingliterature,throughthreemaincontributionsexposedbelow.

·Becauseofthelimitedeffectivenessofsomemodels,

Karpatneetal

.(

2017

)pushforatheory-

guideddatascience.SuchDMmodelsaregroundedintheoreticalbases,inthedomainsofPhysicsandChemistrymainly.Inthecontextofmedicaldiagnosis,wecanadoptasimilarapproach,notguidedbytheory,butratherbytheexperts’domainknowledge.Ourpaperlaysthebasesforsuchanapproach,inbuildingakindofbridgebetweenboththemedicalanddataminingdomains.

·Wenotonlyexpressthattheissueofdiagnosisaidisofaparticularnature,wealsopropose

thetranslationoftheassociatedspecificsintomodelinggoals.Indeed,mostofthepapersthathaveinterestonthespecificsofthemedicaldomainhaveawidescope,andarethusnotspecificallyfocusedondiagnosis,butalsoonprognosisandmonitoringnotably,whichinvolvesthatmodelingisnotdiscussedwithenoughdepth(

BellazziandZupan

,

2008

;

Cios

andMoore

,

2002

;

Lavraˇc

,

1999

).Besides,webringamorerecentpointofviewcomparedtothepapersthatspecificallyaddressedaidedmedicaldiagnosis(

Wagholikaretal

.,

2012

;

Kononenko

,

2001

).

·WedonotprovideanoverviewofDMtechniquesandtherelatedworks;thiswaswidelyproposedinprevioussurveys(

Kalantarietal.

2018

;

Kourouetal.

2015

;

Esfandiarietal.

2014

;

Wagholikaretal

.

2012

;

Yooetal.

2012

).WeratherquestiontheexistingDMtechniques,giventhemodelinggoalsraisedfollowingtheunderstandingoftheproblemanddata.Thisallowsustoraisesomesolidfutureresearchdirections.

5

PREDIcTEDAs>

N

P

Negative(N)

TN

FP

Positive(P)

FN

TP

Figure2:Confusionmatrix

Thepaperisorganizedasfollows.Insection

2

,weexposethematerialsweconsideredtostructureandmakeoursurvey.Theresultsarepresentedinsection

3

anddiscussedinsection

4

.Finally,weconcludethisreportinsection

5

.

2.Materials

2.1.Terminology

Medicaldiagnosisistheresultofachallengingtaskwhichconsistsofcollectingandconciliatingdifferentinformation(

Donner-Banzhoffetal.

,

2017

;

HommersomandLucas

,

2016

;

Miller

,

2016

).Thelatterincludethesymptoms(subjectivedata)andthesigns(objectivedata)ofthetroubleprovidedbyclinicalexaminationsandlaboratorytests.Inquestofexplanationsforthesesymptomsandsigns,theclinicianscometotheconclusionoftheexistence/absenceofatrouble,i.e.thediagnosis.

Atestisoneamongotherelementsthatmotivatesadiagnosis(

Gordis

,

2014

;

CiosandMoore

,

2002

).Thepredictionsofaclinicaltestareofseveraltypes.Apatientwith(respectivelywithout)thediseaseDpredictedassuchisdesignatedastruepositive(resp.truenegative).Incaseofwrongpredictions,thepatientsarefalsepositivesorfalsenegativesrespectively.LetTP(resp.TN)denotethenumberofTruePositives(resp.TrueNegatives)andFP(resp.FN)thenumberofFalsePositives(resp.FalseNegatives);thesequantitiesareusuallyexposedinamatrixofconfusion(seeFigure

2

)(

Wittenetal.

,

2005

).DifferentscalarmetricsarecomputedfromTP,TN,FPandFNtoassesstheperformanceofclinicaltests;theyareexposedinTable

1

(

LalkhenandMcCluskey

,

2008

;

Akobeng

,

2007a

,

b

).Letusnotethatpositiveandnegativepredictivevaluesdependontheprevalenceofthedisease(

Akobeng

,

2007a

):theyareeasilydeducedfromtheknowledgeofsensitivityandspecificity,whicharefreefromsuchaninfluence.

Whenseveraltestsarerequiredtocheckthepresenceofamedicalcondition,thesetestsmaybeassessedgloballyintermsofnetsensitivityandnetspecificity.Thevaluesoftheseindicatorsdependonthewayinwhichthetestswereadministered,i.e.sequentiallyorsimultaneously(

Gordis

,

2014

).Figures

3

and

4

presentthemechanismsofsequentialandparalleltesting.Forillustration

6

Test2(tp2,tn2)

Positive

Test2

(tp2,tn2)

Test2

(tp2,tn2)

METRIc

DEFINITIoN

FoRMuLA

Accuracy(A)

Rateofsuccessfulpredictions

A=TP+TN

TP+FP+TN+FN

Sensitivityor

truepositiverate(tp)

>Abilitytodetectpatientswithagivendisease.

>Probabilitythatapatientwithdis-easetestspositive.

tp=

Specificityor

truenegativerate(tn)

>Abilitytodetectpatientswithoutagivendisease.

>Probabilitythatapatientwithoutdiseasetestsnegative.

tn=

PositivePredictiveValue(PPV)

Chancethatapatient,predictedashavingagivendisease,istrulyso.

PPV=

NegativePredictiveValue(NPV)

Chancethatapatient,predictedasfreefromagivendisease,istrulyso.

NPV=

Table1:Performancemetricsofscreeningtests

Negative

Test1

(tp1,tn1)

Figure3:Sequentialtesting

Positive

Negative

Test1

(tp1,tn1)

NegativeNegative

Negative

Test1

(tp1,tn1)

PositiveNegative

Positive

Figure4:Paralleltesting

7

purposes,theexamplepresentsthecaseoftwotests;theassociatedreasoningmaybegeneralizedtosituationsinvolvingmoretests.Incaseofsequentialtesting,apatientissubmittedtoanotherroundofexaminationifhe/shetestedpositive,inordertosettledefinitelyhis/hermedicalcondition.Ifthepatienttestspositivefollowingasecondroundofexamination,thesubjectisdiagnosedwiththediseaseinquestion.Thus,ifoneofbothtestspresentsanegativeresult,thepatientisconsideredasdisease-free.Theassociatednetsensitivityandspecificityareexpressedas:

tp=tp1.tp2andtn=tn1+tn2-tn1.tn2.

Incontrast,incaseofparalleltesting,apatientisconsideredasnegativeoncealltestsconfirmthisconditionsimultaneously.Inthiscase,theassociatednetspecificityandsensitivityaregivenby:

tn=tn1.tn2andtp=tp1+tp2-tp1.tp2.

Inthesamewaythatacliniciancanaskfortheopinionofanexpertconfrere,he/shecanresorttomodelsfordiagnosisaid.Theonlydifferencebetweenbothscenariosrestsontheexternalnatureofthediagnosticsupport,eitherhumanorcomputerized.Thedataofoneorseveraltest(s)arepotentialinputsfordiagnosisaidmodels.Itshouldbenotedthatnon-interpretedoutcomesoftesting(e.g.acholesterollevel,ascan)constitutethemodelinputs,andnotthevalueofthetest(s),i.e.positiveornegative.Actually,itistheroleofthepredictivemodeltodetermineapatient’smedicalconditioninoutput.

Inlightoftheforegoing,inthepresentsurvey,whatwerefertoasamodelisdifferentfromatest,thelatterbeingapotentialinputoftheformer.Amodelprovidesarecommendationofdiagnosis;atestprovidesaresultthatallows,amongotherpotentialinformation,tomakeadiagnosis.

2.2.Theknowledgediscoveryprocess

TheextractionofknowledgeforthepurposeofdiagnosisaidfitsintoaKnowledgeDiscoveryProcess(KDP).Sinceitspioneerformalizationby

Fayyadetal

.(

1996

),alternativemodelswereproposed,eitheracademically-orindustrially-minded(

KurganandMusilek

,

2006

).Inparticular,theKDPwasadaptedformedicalapplicationsandillustratedfortheissueofdiagnosisaidby

Cios

etal.

(

2007

,

2000

).Theassociatedstepsaresummarizedbelow.

UnderstandingoftheproblemTheprocessisinitiatedbytheproblemstatement,thedefini-tionoftheobjectives,andthesufficientappropriationofadomain-specificvocabulary.Obvi-

8

ously,interactionswithdomainexpertsareessential.Atthislevel,thechoiceofdataminingtechniquesispartiallyforeseengiventheexpressedrequirements.

UnderstandingofthedataThisstepconsistsofcollectingandexploringdata,i.e.observingandanalyzingtheinformation.

PreparationofthedataThecreationoftargetdatasets(

Fayyadetal

.,

1996

)involvesnotablynoiseremovalaswellascheckingthecompletenessandconsistencyofdata.Then,dataareprocessedthroughengineering,selectionandpossiblereductionofpertinentfeatures.

DataminingThisprocessreceivestheprepareddatasets,andextractsknowledge,i.e.patterns,relationships(

BellazziandZupan

,

2008

).

EvaluationofthediscoveredknowledgeTheresultsarecloselyconsidered:theyareexpectedtobringnewandinterestingelements,tobeunderstoodandtomakesense.Here,domainexpertshavetoplayanimportantroleintheirabilitytointerpretandassesstheresults.

UseofthediscoveredknowledgeItcanleadtoactiontaking,decisionmakingorsystemsde-ployment(

Fayyadetal

.,

1996

).

TheKDPisnotstrictlyaone-wayprocessasitisnotexcludedtoreconsidertheworkofpreviousstages:thisallowstoreinforcetheconsistencyoftheresults(

Ciosetal.

,

2007

).Forexample,thefinalevaluationmayaskforrefiningtheresults.Ortobetterunderstandthedata,are-understandingoftheproblemmaystrengthenthedomain-specificknowledge.

2.3.Acceptancecriteria

OnedifficultyrelatedtomedicalDMisthatitmaytargetdifferentpublicswiththeresultingnecessitytoaddressdifferentexpectations.

Actually,aDMapproachmayberequestedinthemedicalfieldbyresearchersandspecialistsinordertostudyagivenpathologythroughtheidentificationofexplanatoryfactors.Inthatcase,theextractedknowledgeisvalidatedifitcarriesacertainlevelofcredibility,measuredbymeansofcriteriarelatedtostatisticalpowernotably.Ifendorsedbythescientificcommunity,suchresultsmaybetakenintoconsideration(directlyorindirectly)bycliniciansfacedwithadiagnosistask.

Assuggestedinsection

2.2

,theextractedknowledgemayalsobedeployedintheformofacomputerizeddiagnosisaid.Despitetheyarethelonelyusersofsuchtechnologies,thecliniciansareinfluencedintheirexpectations,e.g.bythepatientswhoplacealotofhopeinafairdiagnosis.

9

Differentmodelsweredevelopedinanefforttoexplainhowaclinicianmayacceptatechnologyandintegrateittohis/herworkingpractices(

Andargolietal

.,

2017

;

Ketikidisetal.

,

2012

;

Holden

andKarsh

,

2010

;

YarbroughandSmith

,

2007

).ThemostpopularistheTechnologyAcceptanceModel(TAM),introducedby

Davisetal.

(

1989

)andrevisedby

VenkateshandDavis

(

2000

)(TAM2).Enjoyedforitsconcisestructure,themodeldepictsthepsychologicalprocesswhich,influencedbymaterialandsocialfactors,leadstotheintentionofusingacomputerizedapplicationindifferent

contexts(

YarbroughandSmith

,

2007

).

VenkateshandDavis

(

2000

)reportthattheacceptanceoftechnologyisacquiredinpracticeonceitsusefulnessandeaseofusearebothperceivedbytheuser.Moreover,theeaseofuseisoneofthefactorsinfluencingtheuser’sperceptionoftheusefulnessoftheapplication.Theperceptionofusefulnessrestsalsoonsocialfactors:thesubjectivenorm,i.e.theuser’s(professionalorprivate)surroundings’opinionregardinghis/herdecisiontoadopt(ornot)theapplication,andtheimage,i.e.thesocialstatustheapplicationprovidestotheuser(

Munetal.

,

2006

;

Chismarand

Wiley-Patton

,

2002

).

Thesubjectivenormimpactsdirectlytheintentionofuse.Thisinfluenceisexertedontheclini-cianbyhis/herpatientsbutalsobytheprofessionalenvironment.Indeed,thephysicianissensitivetotheopinionofconfreres,particularlyofreferencepeopleinthedomain,eventhoughthisopinionmaybecontrarytothephysician’sbeliefs(

Munetal.

,

2006

).Asfortheinfluenceofthepatients,thestudyof

Shafferetal.

(

2013

)showstheyoftentendtodemonizecomputerizeddiagnosticsupport.Conversely,noncomputer-assistedpracticesareperceivedasapledgeofprofessionalism;maytheclinicianresorttotheopinionofanexpertconfrereisevenperceivedasanintelligentact.Yetinbothlastcases,thecliniciansmightbasetheirdecisiononelementsprovidedintheliteratureandextractedfromaDMapproach.Thus,theinvolvementofcomputinginthediagnosticprocess,ifonlytohaveanadvice,wouldinitselfleadthephysician’simagetotakeahittowardscolleaguesand/orpatients(

Munetal.

,

2006

).

Inthepresentsurvey,wewillhighlightthespecificsofDMtodevelopdiagnosticdecisionsupportmodelswhichmeettherequirementsoftheclinicians.WewilldealwithhowtomakecomputerizeddiagnosisaidfulfillcriteriaofoutputqualityandresultsdemonstrabilityadvocatedbyTAM.Nev-ertheless,itmustberecognizedthatadoptingasuitableapproachofmodelingdoesnotguarantee

exclusivelytheacceptanceofthemodelssincesomerelatedfactors(e.g.subjectivenorm)donotfallwithinDMconcerns.

10

cKnowledgeDiscoveryProcessc

NatureofMedicalData

OverviewofDMTechniques

PerformanceEvaluation

SpecificsofMedicalDM

√√

√√

√√

Selectedtechniquesfordatamininginmedicine

Machinelearningformedicaldiagnosis:history,stateoftheartandperspective

Theuniquenessofmedicaldatamining

Predictivedatamininginclinicalmedicine:currentissuesandguidelines

Introductiontotheminingofclinicaldata

Clinicaldatamining:a

温馨提示

  • 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
  • 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
  • 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
  • 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
  • 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
  • 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
  • 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

评论

0/150

提交评论