下载本文档
版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领
文档简介
Bioinformatics,2024,40(12),btae688
/10.1093/bioinformatics/btae688
AdvanceAccessPublicationDate:18November2024
ApplicationsNote
Dataandtextmining
Downloadedfrom
/bioinformatics/article/40/12/btae688/7903283bygueston26December2024
DeepDR:adeeplearninglibraryfordrugresponseprediction
ZhengxiangJiang
1
,
2
andPengyongLi
1
,
*
1SchoolofComputerScienceandTechnology,XidianUniversity,Xi’an,Shaanxi710126,China2SchoolofElectronicEngineering,XidianUniversity,Xi’an,Shaanxi710126,China
*Correspondingauthor.SchoolofComputerScienceandTechnology,XidianUniversity,266XinglongSectionofXifengRoad,Xi’an,Shaanxi710126,China.E-mail:lipengyong@
AssociateEditor:JonathanWren
Abstract
Summary:Accuratedrugresponsepredictioniscriticaltoadvancingprecisionmedicineanddrugdiscovery.Recentadvancesindeeplearning(DL)haveshownpromiseinpredictingdrugresponse;however,thelackofconvenienttoolstosupportsuchmodelinglimitstheirwidespreadapplication.Toaddressthis,weintroduceDeepDR,thefirstDLlibraryspecificallydevelopedfordrugresponseprediction.DeepDRsimplifiestheprocessbyautomatingdrugandcellfeaturization,modelconstruction,training,andinference,allachievablewithbriefprogramming.Thelibraryincorporatesthreetypesofdrugfeaturesalongwithninedrugencoders,fourtypesofcellfeaturesalongwithninecellencoders,andtwofusionmodules,enablingtheimplementationofupto135DLmodelsfordrugresponseprediction.WealsoexploredbenchmarkingperformancewithDeepDR,andtheoptimalmodelsareavailableonauser-friendlyvisualinterface.
Availabilityandimplementation:DeepDRcanbeinstalledfromPyPI
(/project/deepdr
).ThesourcecodeandexperimentaldataareavailableonGitHub
(/user15632/DeepDR
).
1Introduction
Precisionmedicineaimstodelivertailoredtherapiesforindividualtumorsatthemolecularlevel.Predictingdrugresponse(DR)
(Baptistaetal.2021
)remainsacomplexchallengewithinthisfield,reflectingtheintricaterelationshipbetweencancermulti-omicsinformationandtreatmenteffi-cacy.AccurateDRpredictioncouldsignificantlycontributetothedesignofpersonalizedtreatmentsandtheimprovementoftherapeuticoutcomes.Deeplearning(DL)
(LeCunetal.
2015
),amachinelearningapproach,hasdemonstratedcon-siderablepromiseinidentifyingcomplexpatternswithinbio-logicalinformation,includingcancermulti-omicsanddrugmolecules.ThispotentialhasspurreditsgrowingapplicationinDRmodeling,whereitisconsideredavaluabletoolforen-hancingunderstandingandpredictivecapabilities
(Lietal.
2021a
).However,despitethedevelopmentofnumerousmodelsinthisdomain,thereisstillalackofaunifiedandgeneralizedframeworkformodelconstructionandtraining.
CurrentDLapproachestoDRpredictiontypicallyuseastructuredmethodology,consistingofkeycomponentssuchasdrugmodeling,cellmodeling,andfusionmodulesforpredictiongeneration.Drugmodelingaimstoeffectivelyrepresentthechemicalpropertiesandpotentialbiologicaleffectsofdrugs.Thisisusuallyachievedbyrepresentingthemolecularstructureinformatsconducivetocomputationalprocessing,suchasmo-lecularfingerprints
(Lietal.2021a
),SMILES(SimplifiedMolecularInputLineEntrySystem)
(Liuetal.2019
),andmo-leculargraphs
(Liuetal.2020
),followedbylearningstructuralinformationthroughmodelslikeDeepNeuralNetworks
(DNNs)
(Chawlaetal.2022
),ConvolutionalNeuralNetworks(CNNs)
(Manicaetal.2019
),andGraphNeuralNetworks(GNNs)
(Zhangetal.2019
).Cellmodelinginvolvesprocessingbiologicaldatafromcells,includingtranscriptomics
(Chawla
etal.2022
),genomics
(Liuetal.2019
),andproteomics
(Matlocketal.2018
).DLtechniques,particularlyDNNs
(Chawlaetal.2022
),andCNNs
(Manicaetal.2019
),arelever-agedtolearnintricatepatternswithinthesefeatures.Thefusionmoduleintegratestheinsightsfromdrugandcellmodeling,us-ingDNNs
(Chawlaetal.2022
)orattentionmechanisms
(Sakellaropoulosetal.2019
),topredictdrugresponses.
DRpredictionmodelshaveabroadspectrumofapplica-tionsbeyondtheirprimaryfunction.Thesemodelscanbeutilizedtopredictthepharmacologicalpropertiesorbiologi-calactivityofmoleculesforvirtualscreeningandtoanalyzeomicsdataforcellclassification.TheversatilityofDLmod-elsrendersthemhighlyapplicableinarangeofcontexts.Forexample,clinicalresearchersinvestigatingtheimpactofge-neticvariationsondrugresponsesmightusethesemethodol-ogiestoanalyzegenomicdatafrompatientswithspecificdiseases.Similarly,computationalbiologistsaimingtodevelopadvancedpredictivemodelscanleveragediversedatasetstoexplorevariousmodelingarchitectures,therebyimprovingtheaccuracyofDRpredictions.However,imple-mentingthesemodelsrequiressubstantialexpertiseinDLandsignificantcodingefforts.Thetime-intensiveandcomplexityofadaptingtotheuniqueprogramminginterfa-cesofvariousopen-sourcetoolspresentnonnegligiblechal-lengerequiringresolution.
Received:9September2024;Revised:29October2024;EditorialDecision:11November2024;Accepted:13November2024。TheAuthor(s)2024.PublishedbyOxfordUniversityPress.
ThisisanOpenAccessarticledistributedunderthetermsoftheCreativeCommonsAttributionLicense(
/licenses/by/4.0/
),whichpermitsunrestrictedreuse,distribution,andreproductioninanymedium,providedtheoriginalworkisproperlycited.
2JiangandLi
Toaddressthechallengesabove,weintroduceDeepDR(DeepDrugResponse),aPython-basedDLlibrarydesignedforDRprediction.DeepDRincorporatesthreetypesofdrugfeaturesalongwithninedrugencoders,fourtypesofcellfea-turesalongwithninecellencoders,aswellastwofusionmodules.Thiscomprehensiveframeworksupportstheimple-mentationof135models,cateringtoclinicalresearchersandcomputationalbiologistswithlimitedprogrammingback-grounds.Inaddition,wedemonstratetheutilizationofDeepDRbyimplementingandvalidatingmultiplemodelsontheintegrateddatasets,whichhelpstoidentifythemosteffec-tivemodeling.Tofurthersupportresearchers,wedevelopavisualinterfacethatenablesuserswithoutprogrammingex-pertisetoutilizetheoptimalmodels.
2DeepDRlibrary
2.1Datasetframework
2.1.1Featurization
Drugfeaturization.DeepDRoffersthreemodalitiesofdrugfeatures:FP(MolecularFingerprints)(
Lietal.2021a
),SMILES(SimplifiedMolecularInputLineEntrySystem)
(Liu
etal.2019
),andmoleculargraphs
(Liuetal.2020
)(see
Fig.1B
).FParethebinaryvectorrepresentationsofmole-cules
(RogersandHahn2010
).SMILESprovidesaspecifica-tionforencodingmoleculesasstrings
(Weininger1988
).Graphsrepresentmoleculesbyabstractingatomsasnodesandchemicalbondsasedges
(Kearnesetal.2016
).Detailsareavailablein
SupplementaryTextS1
.
A
Cellfeaturization.DeepDRintegratesfourmodalitiesofcellfeatures:expressionprofile(EXP)
(Manicaetal.2019
),pathwayenrichmentscore(PES)
(Chawlaetal.2022
),muta-tionstatus(MUT)
(Liuetal.2019
),andcopynumbervaria-tion(CNV)
(Liuetal.2019
)(see
Fig.1B
).EXPreflectsthequantitativeexpressionlevelsofgenes
(Heller2002
).PESilluminatesthecombinatorialimplicationsamonggeneswithinspecificpathways
(Hnzelmannetal.2013
).MUTreferstothegeneticalterationsorvariationswithinspecificgenes(
Stensonetal.2017
).CNVrepresentsgenomicdele-tionsandduplicationsobservableatthesubmicroscopicscale
(Freemanetal.2006
).Giventhecomplexityofprocessinghigh-dimensionaldata,DeepDRprovidesfeaturesscreenedongenesubsetsinadditiontogenome-widefeatures
(Jiaetal.
2021
).Detailsareprovidedin
SupplementaryTextS2
.
2.1.2Datasetandsplitting
Downloadedfrom
/bioinformatics/article/40/12/btae688/7903283bygueston26December2024
DeepDRintegratestheCancerCellLineEncyclopedia(CCLE)
(Barretinaetal.2019
)andGenomicsofDrugSensitivityinCancer(GDSC)
(Yangetal.2016
),andallowsuserstousetheirowndatasets(see
SupplementaryTextsS3
andS4
).Themeasurementofdrugresponseisquantifiedus-ingseveralparameters:thenaturallogarithm-transformedIC50(HalfMaximalInhibitoryConcentration),AUC(AreaUndertheDose-responseCurve),andActArea(ActivityArea).Tosupportthevalidation,DeepDRincorporatesfourdatasetsplittingstrategies:commonrandom,leave-cell-out,leave-drug-out,andstrictsplit
(Manicaetal.2019
)(see
Fig.1C
).Theleave-cell-outsplitisdesignedtoeliminateanyoverlapofcellsbetweenthetraining,validation,andtestsets.Thisapproachaimstoreplicatethescenariowherethedrugresponseofnewcellstoexistingdrugsisevaluated.Similarly,theleave-drug-outsplitseekstoemulatetheresponseofknowncellstonoveldrugs,whilethestrictsplitisdesignedtosimulatetheresponseofnovelcellstonoveldrugs.
2.2ModelforDRprediction
DeeplearningDRpredictionmodelcanbeformulatedasencodingfordrugsandcellsandfusionofdrugandcellinfor-mation.Inlinewiththisframework,DeepDRhasdevelopedthreeintegralmodules:thedrugencoder,cellencoder,andfusionmodule.Thesecomponentsaredesignedtoprovidethefoundationfortheflexibleconstructionofpredictivemodelsofdrugresponse.Thefeaturesofdrugsandcellsareintroducedintotheencoder.Subsequently,theencodedinfor-mationisintegratedwithinthefusionmoduletogeneratethepredicteddrugresponse(see
Fig.1A
).
2.2.1Drugencoder
DeepDRintegratesnineencoderstailoredtoprocessdrugmoleculardata(see
Fig.1B
).Theseencodersincludethe
Drugencoder
Fusionmodule
Drugfeaturization
Cellencoder
IC50/AUC/ActArea
Valid
Test
Train
Cellfeaturization
B
C
rization
EXPPES
MUT/CNV
A1
A2
B3
C3
A1
A2
B3
C3
A1
A2
B3
C3
A1
A2
B3
C3
Leavecellout
Leavedrugout
Strict
Random
1.Drugfeaturization2.Cellfeatu
FP
SMILES
Graph
C1=C(C(=O)NC(=O)N1)F
FH
N
()
ONO
H
5.Fusionmodule
CNN
DNN
3.Drugencoder
GNNs
DNN
feature
Drug
GRU/LSTM
MHA
CNN
DAE
4.Cellencoder
Cellfeature
DNN
D
01fromDeepDRimportData,Model,CellEncoder,DrugEncoder,FusionModule
02data=Data.DrData(Data.DrRead.PairDef('CCLE','ActArea'),'EXP','Graph').clean()
03train_data,val_data,_=data.split('cell_out',fold=1,ratio=[0.8,0.2,0.0],seed=1)
04train_loader=Data.DrDataLoader(Data.DrDataset(train_data[0]),batch_size=64,shuffle=True)
05val_loader=Data.DrDataLoader(Data.DrDataset(val_data[0]),batch_size=64,shuffle=False)
06model=Model.DrModel(CellEncoder.DNN(6163,100),DrugEncoder.MPG(),FusionModule.DNN(100,768))
07result=Model.Train(model,epochs=100,lr=1e-4,train_loader=train_loader,val_loader=val_loader)
08data.pair_ls=[['CAL120','5-Fluorouracil'],['CAL51','Afuresertib']]
09result=Model.Predict(model=result[0],data=data)
E
Figure1.OverviewofDeepDRlibrary.(A)Thedrugandcellareprocessedthroughfeaturizationandencoder,andthenthedrugresponseisdecoded
usingthefusionmodule.(B)DeepDRprovidesdrugandcellfeaturization,encoder,andfusionmodule.(C)DeepDRprovidessplittingmethods,includingrandomsplit,leave-cell-outsplit,leave-drug-outsplit,andstrictsplit.(D)ProgrammingframeworkofDeepDRfordatasetloading,modelimplementation,training,andinference.(E)Leave-cell-outperformanceontheCCLEdataset.Usingsubsetmeansusingfeaturesscreenedonthegenesubset,rather
thangenome-widefeatures.Thevaluesinparenthesesarestandarddeviations.
Drugresponsepredictionlibrary3
DNN(DeepNeuralNetwork)leveragingmolecularfinger-prints,andarchitecturessuchasCNN(ConvolutionalNeuralNetwork)
(Liuetal.2019
),GRU(GatedRecurrentUnit)
(DeyandSalem2017
),andLSTM(LongShort-TermMemory)
(GravesandGraves2012
)thatarebasedonSMILESrepresentations.Inaddition,itfeaturesGCN(GraphConvolutionalNetwork)
(Zhangetal.2019
),GAT(GraphAttentionNetwork)
(Velickovicetal.2017
),MPG
(Lietal.
2021c
),AttentiveFP
(Xiongetal.2020
),andTrimNet(
Li
etal.2021b
)foranalyzingmoleculargraphs.TheDNNmod-uleencodesthedrugasasingularvector,whiletheotherarchitecturesproduceasequenceofvectors,witheachvectorcorrespondingtoaSMILEScharacteroranatomwithinthemoleculargraph.TheencodersbasedonSMILESandmolec-ulargraphsareintegratedwithanembeddinglayer,whichisinstrumentalingeneratingdensevectors.
2.2.2Cellencoder
Forcellmodeling,DeepDRintegratesnineencoders:DNNbasedonEXP,PES,MUT,orCNV
(Lietal.2021a
);CNNbasedonEXP,PES,MUT,orCNV
(Manicaetal.2019
);andDAE(DenoisingAutoencoder)basedonEXP
(Chenetal.
2022
)(see
Fig.1B
).TheDNNandCNNmodulesaredesignedtocompressthefeaturesofcellsintolow-dimensionalvectors,thusfacilitatingamorecompactandefficientrepresentationofthedata.TheDAE,ontheotherhand,isspecificallypre-trainedtofocusonminimizingthereconstructionlossofcellfeatures,utilizingthehiddenvectorsastheencodingvectorsforthecells.
2.2.3Fusionmodule
Intermsofintegratingdrugandcellinformation,DeepDRprovidestwomethods:aDNNbasedandanMHA(Multi-headAttention)-basedframework(see
Fig.1B
)
(Vaswani
etal.2017
,
Manicaetal.2019
).Thecellencoderisdesignedtoencodethecellasasinglevector,whilethedrugencoderencodesthedrugasasinglevectororseriesofvectors.WithintheDNN-basedframework,aseriesofvectorscanbecondensedintoasinglevectorthroughtechniquessuchasglobalaveragingormaximumpooling.Incontrast,theMHA-basedapproachcalculatesasfollows:
,、
Attention(Q;K;V)=softmaxV(1)
wherethecellvectorisactingasQ.Thedkisthedimensionofvectorsrepresentingthedrug,whichareconsideredasthematricesKandV.Thisleveragestheattentionmechanismtoeffectivelyextracttheinformationoncelldruginteractionsintoonevector.Botharchitecturesshareacommonprocesswherethevectorsforthedrugandcellareeitheraddedorconcatenated,followedbytheirintroductionintoasucces-sionoflinearlayersforthepredictionofdrugresponses.
3ProgrammingframeworkofDeepDR
DeepDRstreamlinestheDRpredictionworkflowintosevenmodularcomponents,eachthoughtfullystructuredasaclassorfunctiontoenhanceconvenience(see
Fig.1D
):(i)UseData.DrDatatoconstructdrugresponsedata,includingcell-drugpairs,correspondingdrugresponses,cellanddrugfeatures.(ii)Use.clean()and.split()tocleanandsplitdrugresponsedata.(iii)InstantiatethedatasetusingData.
DrDataset.(iv)UseData.DrDataLoadertoloadthedatasetformodeltrainingorvalidation.(v)ThenModel.DrModelisutilizedtoconstructtheDRpredictionmodel.(vi)ThemodelistrainedusingModel.Train,whichconcurrentlyevaluatesperformancetoensureefficacy.(vii)Finally,Model.Predictisdeployedtoforecastdrugresponses,leveragingtheknowl-edgegainedfromthetrainedmodel.DeepDRoffersthreekeymetrics:MeanSquaredError(MSE),R-squared(R2),andPearsonCorrelationCoefficient(PCC).
Downloadedfrom
/bioinformatics/article/40/12/btae688/7903283bygueston26December2024
4EstablishingbenchmarksviaDeepDR
Tobenchmarkdrugresponseprediction,weimplementedandevaluated16models,includingtCNNS
(Liuetal.2019
),Precily
(Chawlaetal.2022
),andDeepDSC
(Lietal.2021a
),alongwithother13novelmodels,onCCLEandGDSC2datasets.Weusedleave-cell-outandleave-drug-outsplittingstrategiestosplitthedatasetsintotraining,validation,andtestsets(8:1:1)usingthreerandomseeds.Eachmodelwastrainedfor100epochsusingtheMSElossfunction,withthelearningratetunedfrom{0.001,0.0001,0.00001}.Were-portthemeanandstandarddeviationofmodelperformanceacrossthethreeseeds.Ourfindings
(Fig.1E
and
SupplementaryTablesS1–S3
)highlightthreekeyobserva-tions:(i)optimalrepresentationsaregraphsfordrugsandex-pressionprofilesforcells.(ii)Predictingtheresponseofnoveldrugsisamoresignificantchallenge.(iii)Pre-trainingtechni-quesfacilitateaccuratepredictionofdrugresponse.Furtheranalysisandimplementationdetailscanbefoundin
SupplementaryTextsS5andS6
and
SupplementaryTables
S4–S7
.TheoptimalmodelsdevelopedwithDeepDRareavail-ableonavisualinterfaceat
https://huggingface.co/spaces/
user15632/DeepDR
.
Authorcontributions
ZhengxiangJiang(Methodology,Datacuration,Visualization,Writing—originaldraft,Writing—review&editing),PengyongLi(Conceptualization,Supervision,Investigation,Methodology,Writing—review&editing)
Supplementarydata
Supplementarydata
areavailableatBioinformaticsonline.Conflictofinterest:Nonedeclared.
Funding
ThisworkwassupportedinpartbytheNationalNaturalScienceFoundationofChina[62202353andU22A2037]andtheFundamentalResearchFundsfortheCentralUniversities.
Dataavailability
ThesourcecodeandexperimentaldataareavailableonGitHub:
/user15632/DeepDR
.InstallationofDeepDRinvolvessimplytyping“pipinstalldeepdr.”
References
BaptistaD,FerreiraPG,RochaM.Deeplearningfordrugresponsepre-
dictionincancer.BriefBioinform2021
;22:360–79.
4JiangandLi
BarretinaJ,CaponigroG,StranskyNetal.Addendum:thecancercell
lineencyclopediaenablespredictivemodellingofanticancerdrug
sensitivity.Nature2019
;565:E5–6.
ChawlaS,RockstrohA,LehmanMetal.Geneexpressionbasedinfer-
enceofcancerdrugsensitivity.NatCommun2022
;13:5680.
ChenJ,WangX,MaAetal.Deeptransferlearningofcancerdrug
responsesbyintegratingbulkandsingle-cellRNA-seqdata.Nat
Commun2022
;13:6494.
DeyR,SalemFM.Gate-variantsofgatedrecurrentunit(GRU)
neuralnetworks.In:2017IEEE60thInternationalMidwestSymposium
onCircuitsandSystems(MWSCAS).IEEE,2017
,1597–600.
FreemanJL,PerryGH,FeukLetal.Copynumbervariation:new
insightsingenomediversity.GenomeRes2006
;16:949–61.
GravesA,GravesA.LongShort-TermMemory.SupervisedSequence
LabellingwithRecurrentNeuralNetworks.NewYork,USA:
Springer,2012
,37–45.
HnzelmannS,CasteloR,GuinneyJ.GSVA:genesetvariationanalysis
formicroarrayandRNA-seqdata.BMCBioinformatics2013
;14:7–15.
HellerMJ.DNAmicroarraytechnology:devices,systems,andapplica-
tions.AnnuRevBiomedEng2002
;4:129–53.
JiaP,HuR,PeiGetal.Deepgenerativeneuralnetworkforaccurate
drugresponseimputation.NatCommun2021
;12:1740.
KearnesS,McCloskeyK,BerndlMetal.Moleculargraphconvolu-
tions:movingbeyondfingerprints.JComputAidedMolDes2016
;30:595–608.
LeCunY,BengioY,HintonG.Deeplearning.Nature2015
;521:436–44.
LiM,WangY,ZhengRetal.Deepdsc:adeeplearningmethodtopre-
dictdrugsensitivityofcancercelllines.IEEE/ACMTransComput
BiolBioinform2021a
;18:575–82.
LiP,LiY,HsiehC-Yetal.Trimnet:learningmolecularrepresentation
fromtripletmessagesforbiomedicine.BriefBioinform2021b
;22:bbaa266.
LiP,WangJ,QiaoYetal.Aneffectiveself-supervisedframeworkfor
learningexpressivemolecularglobalrepresentationstodrugdiscov-
ery.BriefBioinform2021c
;22:bbab109.
LiuP,LiH,LiSetal.Improvingpredictionofphenotypicdrugresponse
oncancercelllinesusingdeepconvolutionalnetwork.BMC
Bioinformatics2019
;20:408.
LiuQ,HuZ,Jiang
温馨提示
- 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
- 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
- 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
- 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
- 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
- 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
- 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。
最新文档
- 全新三轮车销售合同3篇
- 借款合同的解除协议争议解决机构3篇
- 剧院舞台互动显示屏采购协议2篇
- 商业赞助合同模板3篇
- 只投资不参与经营协议书范本3篇
- 劳务合同与劳动合同区别与合规2篇
- 琴行兼职老师合同范例
- 武汉商贸职业学院《社交礼仪》2023-2024学年第一学期期末试卷
- 武汉晴川学院《数学物理方程Ⅰ》2023-2024学年第一学期期末试卷
- 武汉轻工大学《火电厂烟气净化Ⅱ》2023-2024学年第一学期期末试卷
- X62W万能铣床电气控制
- 常用普通螺纹加工的中径和顶径极限偏差快速查询表
- 职工配偶未就业承诺书
- 质量认证基础知识(共218页).ppt
- 斜皮带机皮带跑偏调整方法ppt课件
- 《光学教程》[姚启钧]课后习题解答
- 供应室不良事件
- ACOG指南:妊娠期高血压疾病指南(专家解读)
- 服务外包公司评价表(共1页)
- 一年级数学月考试卷分析
- 泵用机械密封冲洗方案及操作方法
评论
0/150
提交评论