DeepDR:用于药物反应预测的深度学习库 DeepDR -a deep learning library for drug response prediction_第1页
DeepDR:用于药物反应预测的深度学习库 DeepDR -a deep learning library for drug response prediction_第2页
DeepDR:用于药物反应预测的深度学习库 DeepDR -a deep learning library for drug response prediction_第3页
DeepDR:用于药物反应预测的深度学习库 DeepDR -a deep learning library for drug response prediction_第4页
DeepDR:用于药物反应预测的深度学习库 DeepDR -a deep learning library for drug response prediction_第5页
免费预览已结束,剩余3页可下载查看

下载本文档

版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领

文档简介

Bioinformatics,2024,40(12),btae688

/10.1093/bioinformatics/btae688

AdvanceAccessPublicationDate:18November2024

ApplicationsNote

Dataandtextmining

Downloadedfrom

/bioinformatics/article/40/12/btae688/7903283bygueston26December2024

DeepDR:adeeplearninglibraryfordrugresponseprediction

ZhengxiangJiang

1

,

2

andPengyongLi

1

,

*

1SchoolofComputerScienceandTechnology,XidianUniversity,Xi’an,Shaanxi710126,China2SchoolofElectronicEngineering,XidianUniversity,Xi’an,Shaanxi710126,China

*Correspondingauthor.SchoolofComputerScienceandTechnology,XidianUniversity,266XinglongSectionofXifengRoad,Xi’an,Shaanxi710126,China.E-mail:lipengyong@

AssociateEditor:JonathanWren

Abstract

Summary:Accuratedrugresponsepredictioniscriticaltoadvancingprecisionmedicineanddrugdiscovery.Recentadvancesindeeplearning(DL)haveshownpromiseinpredictingdrugresponse;however,thelackofconvenienttoolstosupportsuchmodelinglimitstheirwidespreadapplication.Toaddressthis,weintroduceDeepDR,thefirstDLlibraryspecificallydevelopedfordrugresponseprediction.DeepDRsimplifiestheprocessbyautomatingdrugandcellfeaturization,modelconstruction,training,andinference,allachievablewithbriefprogramming.Thelibraryincorporatesthreetypesofdrugfeaturesalongwithninedrugencoders,fourtypesofcellfeaturesalongwithninecellencoders,andtwofusionmodules,enablingtheimplementationofupto135DLmodelsfordrugresponseprediction.WealsoexploredbenchmarkingperformancewithDeepDR,andtheoptimalmodelsareavailableonauser-friendlyvisualinterface.

Availabilityandimplementation:DeepDRcanbeinstalledfromPyPI

(/project/deepdr

).ThesourcecodeandexperimentaldataareavailableonGitHub

(/user15632/DeepDR

).

1Introduction

Precisionmedicineaimstodelivertailoredtherapiesforindividualtumorsatthemolecularlevel.Predictingdrugresponse(DR)

(Baptistaetal.2021

)remainsacomplexchallengewithinthisfield,reflectingtheintricaterelationshipbetweencancermulti-omicsinformationandtreatmenteffi-cacy.AccurateDRpredictioncouldsignificantlycontributetothedesignofpersonalizedtreatmentsandtheimprovementoftherapeuticoutcomes.Deeplearning(DL)

(LeCunetal.

2015

),amachinelearningapproach,hasdemonstratedcon-siderablepromiseinidentifyingcomplexpatternswithinbio-logicalinformation,includingcancermulti-omicsanddrugmolecules.ThispotentialhasspurreditsgrowingapplicationinDRmodeling,whereitisconsideredavaluabletoolforen-hancingunderstandingandpredictivecapabilities

(Lietal.

2021a

).However,despitethedevelopmentofnumerousmodelsinthisdomain,thereisstillalackofaunifiedandgeneralizedframeworkformodelconstructionandtraining.

CurrentDLapproachestoDRpredictiontypicallyuseastructuredmethodology,consistingofkeycomponentssuchasdrugmodeling,cellmodeling,andfusionmodulesforpredictiongeneration.Drugmodelingaimstoeffectivelyrepresentthechemicalpropertiesandpotentialbiologicaleffectsofdrugs.Thisisusuallyachievedbyrepresentingthemolecularstructureinformatsconducivetocomputationalprocessing,suchasmo-lecularfingerprints

(Lietal.2021a

),SMILES(SimplifiedMolecularInputLineEntrySystem)

(Liuetal.2019

),andmo-leculargraphs

(Liuetal.2020

),followedbylearningstructuralinformationthroughmodelslikeDeepNeuralNetworks

(DNNs)

(Chawlaetal.2022

),ConvolutionalNeuralNetworks(CNNs)

(Manicaetal.2019

),andGraphNeuralNetworks(GNNs)

(Zhangetal.2019

).Cellmodelinginvolvesprocessingbiologicaldatafromcells,includingtranscriptomics

(Chawla

etal.2022

),genomics

(Liuetal.2019

),andproteomics

(Matlocketal.2018

).DLtechniques,particularlyDNNs

(Chawlaetal.2022

),andCNNs

(Manicaetal.2019

),arelever-agedtolearnintricatepatternswithinthesefeatures.Thefusionmoduleintegratestheinsightsfromdrugandcellmodeling,us-ingDNNs

(Chawlaetal.2022

)orattentionmechanisms

(Sakellaropoulosetal.2019

),topredictdrugresponses.

DRpredictionmodelshaveabroadspectrumofapplica-tionsbeyondtheirprimaryfunction.Thesemodelscanbeutilizedtopredictthepharmacologicalpropertiesorbiologi-calactivityofmoleculesforvirtualscreeningandtoanalyzeomicsdataforcellclassification.TheversatilityofDLmod-elsrendersthemhighlyapplicableinarangeofcontexts.Forexample,clinicalresearchersinvestigatingtheimpactofge-neticvariationsondrugresponsesmightusethesemethodol-ogiestoanalyzegenomicdatafrompatientswithspecificdiseases.Similarly,computationalbiologistsaimingtodevelopadvancedpredictivemodelscanleveragediversedatasetstoexplorevariousmodelingarchitectures,therebyimprovingtheaccuracyofDRpredictions.However,imple-mentingthesemodelsrequiressubstantialexpertiseinDLandsignificantcodingefforts.Thetime-intensiveandcomplexityofadaptingtotheuniqueprogramminginterfa-cesofvariousopen-sourcetoolspresentnonnegligiblechal-lengerequiringresolution.

Received:9September2024;Revised:29October2024;EditorialDecision:11November2024;Accepted:13November2024。TheAuthor(s)2024.PublishedbyOxfordUniversityPress.

ThisisanOpenAccessarticledistributedunderthetermsoftheCreativeCommonsAttributionLicense(

/licenses/by/4.0/

),whichpermitsunrestrictedreuse,distribution,andreproductioninanymedium,providedtheoriginalworkisproperlycited.

2JiangandLi

Toaddressthechallengesabove,weintroduceDeepDR(DeepDrugResponse),aPython-basedDLlibrarydesignedforDRprediction.DeepDRincorporatesthreetypesofdrugfeaturesalongwithninedrugencoders,fourtypesofcellfea-turesalongwithninecellencoders,aswellastwofusionmodules.Thiscomprehensiveframeworksupportstheimple-mentationof135models,cateringtoclinicalresearchersandcomputationalbiologistswithlimitedprogrammingback-grounds.Inaddition,wedemonstratetheutilizationofDeepDRbyimplementingandvalidatingmultiplemodelsontheintegrateddatasets,whichhelpstoidentifythemosteffec-tivemodeling.Tofurthersupportresearchers,wedevelopavisualinterfacethatenablesuserswithoutprogrammingex-pertisetoutilizetheoptimalmodels.

2DeepDRlibrary

2.1Datasetframework

2.1.1Featurization

Drugfeaturization.DeepDRoffersthreemodalitiesofdrugfeatures:FP(MolecularFingerprints)(

Lietal.2021a

),SMILES(SimplifiedMolecularInputLineEntrySystem)

(Liu

etal.2019

),andmoleculargraphs

(Liuetal.2020

)(see

Fig.1B

).FParethebinaryvectorrepresentationsofmole-cules

(RogersandHahn2010

).SMILESprovidesaspecifica-tionforencodingmoleculesasstrings

(Weininger1988

).Graphsrepresentmoleculesbyabstractingatomsasnodesandchemicalbondsasedges

(Kearnesetal.2016

).Detailsareavailablein

SupplementaryTextS1

.

A

Cellfeaturization.DeepDRintegratesfourmodalitiesofcellfeatures:expressionprofile(EXP)

(Manicaetal.2019

),pathwayenrichmentscore(PES)

(Chawlaetal.2022

),muta-tionstatus(MUT)

(Liuetal.2019

),andcopynumbervaria-tion(CNV)

(Liuetal.2019

)(see

Fig.1B

).EXPreflectsthequantitativeexpressionlevelsofgenes

(Heller2002

).PESilluminatesthecombinatorialimplicationsamonggeneswithinspecificpathways

(Hnzelmannetal.2013

).MUTreferstothegeneticalterationsorvariationswithinspecificgenes(

Stensonetal.2017

).CNVrepresentsgenomicdele-tionsandduplicationsobservableatthesubmicroscopicscale

(Freemanetal.2006

).Giventhecomplexityofprocessinghigh-dimensionaldata,DeepDRprovidesfeaturesscreenedongenesubsetsinadditiontogenome-widefeatures

(Jiaetal.

2021

).Detailsareprovidedin

SupplementaryTextS2

.

2.1.2Datasetandsplitting

Downloadedfrom

/bioinformatics/article/40/12/btae688/7903283bygueston26December2024

DeepDRintegratestheCancerCellLineEncyclopedia(CCLE)

(Barretinaetal.2019

)andGenomicsofDrugSensitivityinCancer(GDSC)

(Yangetal.2016

),andallowsuserstousetheirowndatasets(see

SupplementaryTextsS3

andS4

).Themeasurementofdrugresponseisquantifiedus-ingseveralparameters:thenaturallogarithm-transformedIC50(HalfMaximalInhibitoryConcentration),AUC(AreaUndertheDose-responseCurve),andActArea(ActivityArea).Tosupportthevalidation,DeepDRincorporatesfourdatasetsplittingstrategies:commonrandom,leave-cell-out,leave-drug-out,andstrictsplit

(Manicaetal.2019

)(see

Fig.1C

).Theleave-cell-outsplitisdesignedtoeliminateanyoverlapofcellsbetweenthetraining,validation,andtestsets.Thisapproachaimstoreplicatethescenariowherethedrugresponseofnewcellstoexistingdrugsisevaluated.Similarly,theleave-drug-outsplitseekstoemulatetheresponseofknowncellstonoveldrugs,whilethestrictsplitisdesignedtosimulatetheresponseofnovelcellstonoveldrugs.

2.2ModelforDRprediction

DeeplearningDRpredictionmodelcanbeformulatedasencodingfordrugsandcellsandfusionofdrugandcellinfor-mation.Inlinewiththisframework,DeepDRhasdevelopedthreeintegralmodules:thedrugencoder,cellencoder,andfusionmodule.Thesecomponentsaredesignedtoprovidethefoundationfortheflexibleconstructionofpredictivemodelsofdrugresponse.Thefeaturesofdrugsandcellsareintroducedintotheencoder.Subsequently,theencodedinfor-mationisintegratedwithinthefusionmoduletogeneratethepredicteddrugresponse(see

Fig.1A

).

2.2.1Drugencoder

DeepDRintegratesnineencoderstailoredtoprocessdrugmoleculardata(see

Fig.1B

).Theseencodersincludethe

Drugencoder

Fusionmodule

Drugfeaturization

Cellencoder

IC50/AUC/ActArea

Valid

Test

Train

Cellfeaturization

B

C

rization

EXPPES

MUT/CNV

A1

A2

B3

C3

A1

A2

B3

C3

A1

A2

B3

C3

A1

A2

B3

C3

Leavecellout

Leavedrugout

Strict

Random

1.Drugfeaturization2.Cellfeatu

FP

SMILES

Graph

C1=C(C(=O)NC(=O)N1)F

FH

N

()

ONO

H

5.Fusionmodule

CNN

DNN

3.Drugencoder

GNNs

DNN

feature

Drug

GRU/LSTM

MHA

CNN

DAE

4.Cellencoder

Cellfeature

DNN

D

01fromDeepDRimportData,Model,CellEncoder,DrugEncoder,FusionModule

02data=Data.DrData(Data.DrRead.PairDef('CCLE','ActArea'),'EXP','Graph').clean()

03train_data,val_data,_=data.split('cell_out',fold=1,ratio=[0.8,0.2,0.0],seed=1)

04train_loader=Data.DrDataLoader(Data.DrDataset(train_data[0]),batch_size=64,shuffle=True)

05val_loader=Data.DrDataLoader(Data.DrDataset(val_data[0]),batch_size=64,shuffle=False)

06model=Model.DrModel(CellEncoder.DNN(6163,100),DrugEncoder.MPG(),FusionModule.DNN(100,768))

07result=Model.Train(model,epochs=100,lr=1e-4,train_loader=train_loader,val_loader=val_loader)

08data.pair_ls=[['CAL120','5-Fluorouracil'],['CAL51','Afuresertib']]

09result=Model.Predict(model=result[0],data=data)

E

Figure1.OverviewofDeepDRlibrary.(A)Thedrugandcellareprocessedthroughfeaturizationandencoder,andthenthedrugresponseisdecoded

usingthefusionmodule.(B)DeepDRprovidesdrugandcellfeaturization,encoder,andfusionmodule.(C)DeepDRprovidessplittingmethods,includingrandomsplit,leave-cell-outsplit,leave-drug-outsplit,andstrictsplit.(D)ProgrammingframeworkofDeepDRfordatasetloading,modelimplementation,training,andinference.(E)Leave-cell-outperformanceontheCCLEdataset.Usingsubsetmeansusingfeaturesscreenedonthegenesubset,rather

thangenome-widefeatures.Thevaluesinparenthesesarestandarddeviations.

Drugresponsepredictionlibrary3

DNN(DeepNeuralNetwork)leveragingmolecularfinger-prints,andarchitecturessuchasCNN(ConvolutionalNeuralNetwork)

(Liuetal.2019

),GRU(GatedRecurrentUnit)

(DeyandSalem2017

),andLSTM(LongShort-TermMemory)

(GravesandGraves2012

)thatarebasedonSMILESrepresentations.Inaddition,itfeaturesGCN(GraphConvolutionalNetwork)

(Zhangetal.2019

),GAT(GraphAttentionNetwork)

(Velickovicetal.2017

),MPG

(Lietal.

2021c

),AttentiveFP

(Xiongetal.2020

),andTrimNet(

Li

etal.2021b

)foranalyzingmoleculargraphs.TheDNNmod-uleencodesthedrugasasingularvector,whiletheotherarchitecturesproduceasequenceofvectors,witheachvectorcorrespondingtoaSMILEScharacteroranatomwithinthemoleculargraph.TheencodersbasedonSMILESandmolec-ulargraphsareintegratedwithanembeddinglayer,whichisinstrumentalingeneratingdensevectors.

2.2.2Cellencoder

Forcellmodeling,DeepDRintegratesnineencoders:DNNbasedonEXP,PES,MUT,orCNV

(Lietal.2021a

);CNNbasedonEXP,PES,MUT,orCNV

(Manicaetal.2019

);andDAE(DenoisingAutoencoder)basedonEXP

(Chenetal.

2022

)(see

Fig.1B

).TheDNNandCNNmodulesaredesignedtocompressthefeaturesofcellsintolow-dimensionalvectors,thusfacilitatingamorecompactandefficientrepresentationofthedata.TheDAE,ontheotherhand,isspecificallypre-trainedtofocusonminimizingthereconstructionlossofcellfeatures,utilizingthehiddenvectorsastheencodingvectorsforthecells.

2.2.3Fusionmodule

Intermsofintegratingdrugandcellinformation,DeepDRprovidestwomethods:aDNNbasedandanMHA(Multi-headAttention)-basedframework(see

Fig.1B

)

(Vaswani

etal.2017

,

Manicaetal.2019

).Thecellencoderisdesignedtoencodethecellasasinglevector,whilethedrugencoderencodesthedrugasasinglevectororseriesofvectors.WithintheDNN-basedframework,aseriesofvectorscanbecondensedintoasinglevectorthroughtechniquessuchasglobalaveragingormaximumpooling.Incontrast,theMHA-basedapproachcalculatesasfollows:

,、

Attention(Q;K;V)=softmaxV(1)

wherethecellvectorisactingasQ.Thedkisthedimensionofvectorsrepresentingthedrug,whichareconsideredasthematricesKandV.Thisleveragestheattentionmechanismtoeffectivelyextracttheinformationoncelldruginteractionsintoonevector.Botharchitecturesshareacommonprocesswherethevectorsforthedrugandcellareeitheraddedorconcatenated,followedbytheirintroductionintoasucces-sionoflinearlayersforthepredictionofdrugresponses.

3ProgrammingframeworkofDeepDR

DeepDRstreamlinestheDRpredictionworkflowintosevenmodularcomponents,eachthoughtfullystructuredasaclassorfunctiontoenhanceconvenience(see

Fig.1D

):(i)UseData.DrDatatoconstructdrugresponsedata,includingcell-drugpairs,correspondingdrugresponses,cellanddrugfeatures.(ii)Use.clean()and.split()tocleanandsplitdrugresponsedata.(iii)InstantiatethedatasetusingData.

DrDataset.(iv)UseData.DrDataLoadertoloadthedatasetformodeltrainingorvalidation.(v)ThenModel.DrModelisutilizedtoconstructtheDRpredictionmodel.(vi)ThemodelistrainedusingModel.Train,whichconcurrentlyevaluatesperformancetoensureefficacy.(vii)Finally,Model.Predictisdeployedtoforecastdrugresponses,leveragingtheknowl-edgegainedfromthetrainedmodel.DeepDRoffersthreekeymetrics:MeanSquaredError(MSE),R-squared(R2),andPearsonCorrelationCoefficient(PCC).

Downloadedfrom

/bioinformatics/article/40/12/btae688/7903283bygueston26December2024

4EstablishingbenchmarksviaDeepDR

Tobenchmarkdrugresponseprediction,weimplementedandevaluated16models,includingtCNNS

(Liuetal.2019

),Precily

(Chawlaetal.2022

),andDeepDSC

(Lietal.2021a

),alongwithother13novelmodels,onCCLEandGDSC2datasets.Weusedleave-cell-outandleave-drug-outsplittingstrategiestosplitthedatasetsintotraining,validation,andtestsets(8:1:1)usingthreerandomseeds.Eachmodelwastrainedfor100epochsusingtheMSElossfunction,withthelearningratetunedfrom{0.001,0.0001,0.00001}.Were-portthemeanandstandarddeviationofmodelperformanceacrossthethreeseeds.Ourfindings

(Fig.1E

and

SupplementaryTablesS1–S3

)highlightthreekeyobserva-tions:(i)optimalrepresentationsaregraphsfordrugsandex-pressionprofilesforcells.(ii)Predictingtheresponseofnoveldrugsisamoresignificantchallenge.(iii)Pre-trainingtechni-quesfacilitateaccuratepredictionofdrugresponse.Furtheranalysisandimplementationdetailscanbefoundin

SupplementaryTextsS5andS6

and

SupplementaryTables

S4–S7

.TheoptimalmodelsdevelopedwithDeepDRareavail-ableonavisualinterfaceat

https://huggingface.co/spaces/

user15632/DeepDR

.

Authorcontributions

ZhengxiangJiang(Methodology,Datacuration,Visualization,Writing—originaldraft,Writing—review&editing),PengyongLi(Conceptualization,Supervision,Investigation,Methodology,Writing—review&editing)

Supplementarydata

Supplementarydata

areavailableatBioinformaticsonline.Conflictofinterest:Nonedeclared.

Funding

ThisworkwassupportedinpartbytheNationalNaturalScienceFoundationofChina[62202353andU22A2037]andtheFundamentalResearchFundsfortheCentralUniversities.

Dataavailability

ThesourcecodeandexperimentaldataareavailableonGitHub:

/user15632/DeepDR

.InstallationofDeepDRinvolvessimplytyping“pipinstalldeepdr.”

References

BaptistaD,FerreiraPG,RochaM.Deeplearningfordrugresponsepre-

dictionincancer.BriefBioinform2021

;22:360–79.

4JiangandLi

BarretinaJ,CaponigroG,StranskyNetal.Addendum:thecancercell

lineencyclopediaenablespredictivemodellingofanticancerdrug

sensitivity.Nature2019

;565:E5–6.

ChawlaS,RockstrohA,LehmanMetal.Geneexpressionbasedinfer-

enceofcancerdrugsensitivity.NatCommun2022

;13:5680.

ChenJ,WangX,MaAetal.Deeptransferlearningofcancerdrug

responsesbyintegratingbulkandsingle-cellRNA-seqdata.Nat

Commun2022

;13:6494.

DeyR,SalemFM.Gate-variantsofgatedrecurrentunit(GRU)

neuralnetworks.In:2017IEEE60thInternationalMidwestSymposium

onCircuitsandSystems(MWSCAS).IEEE,2017

,1597–600.

FreemanJL,PerryGH,FeukLetal.Copynumbervariation:new

insightsingenomediversity.GenomeRes2006

;16:949–61.

GravesA,GravesA.LongShort-TermMemory.SupervisedSequence

LabellingwithRecurrentNeuralNetworks.NewYork,USA:

Springer,2012

,37–45.

HnzelmannS,CasteloR,GuinneyJ.GSVA:genesetvariationanalysis

formicroarrayandRNA-seqdata.BMCBioinformatics2013

;14:7–15.

HellerMJ.DNAmicroarraytechnology:devices,systems,andapplica-

tions.AnnuRevBiomedEng2002

;4:129–53.

JiaP,HuR,PeiGetal.Deepgenerativeneuralnetworkforaccurate

drugresponseimputation.NatCommun2021

;12:1740.

KearnesS,McCloskeyK,BerndlMetal.Moleculargraphconvolu-

tions:movingbeyondfingerprints.JComputAidedMolDes2016

;30:595–608.

LeCunY,BengioY,HintonG.Deeplearning.Nature2015

;521:436–44.

LiM,WangY,ZhengRetal.Deepdsc:adeeplearningmethodtopre-

dictdrugsensitivityofcancercelllines.IEEE/ACMTransComput

BiolBioinform2021a

;18:575–82.

LiP,LiY,HsiehC-Yetal.Trimnet:learningmolecularrepresentation

fromtripletmessagesforbiomedicine.BriefBioinform2021b

;22:bbaa266.

LiP,WangJ,QiaoYetal.Aneffectiveself-supervisedframeworkfor

learningexpressivemolecularglobalrepresentationstodrugdiscov-

ery.BriefBioinform2021c

;22:bbab109.

LiuP,LiH,LiSetal.Improvingpredictionofphenotypicdrugresponse

oncancercelllinesusingdeepconvolutionalnetwork.BMC

Bioinformatics2019

;20:408.

LiuQ,HuZ,Jiang

温馨提示

  • 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
  • 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
  • 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
  • 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
  • 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
  • 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
  • 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

评论

0/150

提交评论