制药科学中的AI Artificial Intelligence in Pharmaceutical Sciences_第1页
制药科学中的AI Artificial Intelligence in Pharmaceutical Sciences_第2页
制药科学中的AI Artificial Intelligence in Pharmaceutical Sciences_第3页
制药科学中的AI Artificial Intelligence in Pharmaceutical Sciences_第4页
制药科学中的AI Artificial Intelligence in Pharmaceutical Sciences_第5页
已阅读5页,还剩116页未读 继续免费阅读

下载本文档

版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领

文档简介

JournalPre-proofs

ArtificialIntelligenceinPharmaceuticalSciences

MingkunLu,JiayiYin,QiZhu,GaoleLin,MinjieMou,FuyaoLiu,ZiqiPan,

NanxinYou,XichenLian,FengchengLi,HongningZhang,LingyanZheng,

WeiZhang,HanyuZhang,ZihaoShen,ZhenGu,HonglinLi,FengZhu

PII:

S2095-8099(23)00164-9

DOI:

/10.1016/j.eng.2023.01.014

Reference:

ENG1255

Toappearin:

Engineering

ReceivedDate:

30September2022

RevisedDate:

11December2022

AcceptedDate:

6January2023

Pleasecitethisarticleas:M.Lu,J.Yin,Q.Zhu,G.Lin,M.Mou,F.Liu,Z.Pan,N.You,X.Lian,F.Li,H.

Zhang,L.Zheng,W.Zhang,H.Zhang,Z.Shen,Z.Gu,H.Li,F.Zhu,ArtificialIntelligenceinPharmaceutical

Sciences,Engineering(2023),doi:

/10.1016/j.eng.2023.01.014

ThisisaPDFfileofanarticlethathasundergoneenhancementsafteracceptance,suchastheadditionofacoverpageandmetadata,andformattingforreadability,butitisnotyetthedefinitiveversionofrecord.Thisversionwillundergoadditionalcopyediting,typesettingandreviewbeforeitispublishedinitsfinalform,butweareprovidingthisversiontogiveearlyvisibilityofthearticle.Pleasenotethat,duringtheproductionprocess,errorsmaybediscoveredwhichcouldaffectthecontent,andalllegaldisclaimersthatapplytothejournalpertain.

©2023PublishedbyElsevierLtd.onbehalfofChineseAcademyofEngineering.

1

Research

SmartProcessManufacturing—Review

ArtificialIntelligenceinPharmaceuticalSciences

MingkunLua,c,JiayiYina,QiZhua,GaoleLina,MinjieMoua,FuyaoLiua,ZiqiPana,NanxinYoua,XichenLiana,FengchengLia,HongningZhanga,LingyanZhenga,c,WeiZhanga,HanyuZhanga,ZihaoShenb,d,ZhenGua,

HonglinLib,d,e,*,FengZhua,c,*

aTheSecondAffiliatedHospital,ZhejiangUniversitySchoolofMedicine&CollegeofPharmaceuticalSciences,ZhejiangUniversity,Hangzhou310058,ChinabShanghaiKeyLaboratoryofNewDrugDesign,EastChinaUniversityofScienceandTechnology,Shanghai200237,China

cInnovationInstituteforArtificialIntelligenceinMedicineofZhejiangUniversity,Alibaba–ZhejiangUniversityJointResearchCenterofFutureDigitalHealthcare,Hangzhou330110,ChinadInnovationCenterforAIandDrugDiscovery,EastChinaNormalUniversity,Shanghai200062,China

eLingangLaboratory,Shanghai200031,China

*Correspondingauthors.

E-mailaddresses:

hlli@

(H.Li),

zhufeng@

(F.Zhu).

ARTICLEINFO

Articlehistory:

Received

Revised

Accepted

Availableonline

Keywords:

Artificialintelligence

Machinelearning

Deeplearning

Targetidentification

Targetdiscovery

Drugdesign

Drugdiscovery

2

ABSTRACT

Drugdiscoveryanddevelopmentaffectsvariousaspectsofhumanhealthanddramaticallyimpactsthepharmaceuticalmarket.However,investmentsinanewdrugoftengounrewardedduetothelongandcomplexprocessofdrugresearchanddevelopment(R&D).Withtheadvancementofexperimentaltechnologyandcomputerhardware,artificialintelligence(AI)hasrecentlyemergedasaleadingtoolinanalyzingabundantandhigh-dimensionaldata.ExplosivegrowthinthesizeofbiomedicaldataprovidesadvantagesinapplyingAIinallstagesofdrugR&D.Drivenbybigdatainbiomedicine,AIhasledtoarevolutionindrugR&D,duetoitsabilitytodiscovernewdrugsmoreefficientlyandatlowercost.ThisreviewbeginswithabriefoverviewofcommonAImodelsinthefieldofdrugdiscovery;then,itsummarizesanddiscussesindepththeirspecificapplicationsinvariousstagesofdrugR&D,suchastargetdiscovery,drugdiscoveryanddesign,preclinicalresearch,automateddrugsynthesis,andinfluencesinthepharmaceuticalmarket.Finally,themajorlimitationsofAIindrugR&Darefullydiscussedandpossiblesolutionsareproposed.

1.Introduction

Inthepastfewdecades,thepharmaceuticalindustryhasbeenlimitedbytheextentofcutting-edgeresearchinpharmaceuticalsciences,becausethedevelopmentofnewdrugsisalongandcomplexprocessaccompaniedbyhighrisksandhighcosts[1,2].Inotherwords,thecurrentfieldofdrugresearchanddevelopment(R&D)requiressignificantproductivityimprovementstoshortenthecycletimeandcostofdrugdevelopment[3].Technologiessuchasnetworkpharmacology,RNA-sequencing(RNA-seq),high-throughputscreening(HTS),orvirtualscreening(VS)haveallacceleratedthediscoveryofnewtargets,aswellasnewdrugstosomeextent[4–9].Nevertheless,thesetechnologieshaverarelybeensignificantcontributorstothecurrentprocessofnewdrugdiscovery.Thus,thereisanurgentneedfornewtechnologytodrivethedevelopmentofnewdrugs.

Asthecomputingpowerofdevicesgrows,artificialintelligence(AI)hasbeenusedinmanyrealcases,suchasinimageclassificationandspeechrecognition,duetoitsabilitytolearn,process,andpredictmassiveamountsofinformation[10–12].Atpresent,afteralongperiodofdataaccumulation,incombinationwiththedevelopmentofhigh-throughputRNA-seqtechnology,massiveamountsofbiomedicaldatahavebeencollected[13–18].Biomedicaldata,whichhasahighlevelofheterogeneityandcomplexity,comesfromavarietyofsources,includingomicsdatafromdifferentplatforms,experimentaldatafrombiologicalorchemicallaboratories,datageneratedbypharmaceuticalcompanies,publiclydisclosedtextualinformation,andmanuallycollateddatafrompubliclyavailabledatabases[19–22].AIcanbeusedtolearnthepotentialpatternsinthesevastamountsofbiomedicaldata,therebybringingnewopportunitiesandchallengestothepharmaceuticalsciencesandindustries.

TheAlphaFold2systemusedAIinthecriticalassessmentofproteinstructureprediction14(CASP14)competitionandoutperformedothersinaccuratelypredictingthethree-dimensional(3D)structuresofproteins[23].Similarly,intheOpen-GraphBenchmarkLarge-ScaleChallenge(OGB-LSC)competition,agraphneuralnetwork(GNN)combinedwithatransformermodelwonthetoprankinpredictingthemolecularpropertiescalculatedbymeansofdensityfunctionaltheory(DFT),whichisdifficultandhighlytime-consumingusingtraditionalmethods[24].ThesecompetitionsdemonstratedthestrongabilityofAItoanalyzebiologicalorchemicaldata.Duetoitspowerfulcapabilitytoutilizerelatedbiomedicaldatatounderstandcomplexbiologicalsystemsandchemicalreactionspaces[25,26],AIhashadarevolutionaryimpactonallstagesofdrugR&D,includingnotonlyresearchonproteinsandsmallmoleculesbutalsotheassisteddesignofclinicaltrialsandpost-marketsurveillance[27].Furthermore,inpharmaceuticalcompanies,manystate-of-the-art(SOTA)AImodelshavebeenadoptedindiversepipelinestoshortentheR&Dcycletimeanddecreasecosts[28–30].

AItechniquesinthiscontextmainlyinvolvemachinelearning(ML)anddeeplearning(DL).BothMLandDLalgorithmsareinvolvedintargetdiscoveryandvalidation[31],drugdiscoveryanddesign[32],andpreclinicaldrugresearch[33],wheretheyareusedtoanalyzedifferentdatacharacteristicsindifferentformats.Afteradrugcandidateisenrolledinaclinicaltrial[34],DLplaysapivotalroleinassistinginthedesignoftheclinicaltrialandinsupervisingandanalyzingdatafromtheclinicalphaseIV[33].Approveddrugshaveastrongimpactonmanufacturing[35]andthemarketeconomy,andDLcanplayapartintheseareasaswell.Therefore,inthisreview,wepresentacomprehensiveoverviewofmostaspectsoftheuseofAIinthepharmaceuticalsciences.WefocusonhowAIcanbeusedtopromotetargetdiscoveryanddrugdiscovery(asshowninFig.1)andreflectonhowtofurtheracceleratethedevelopmentofthisfield.

3

Fig.1.SummaryofAIapplicationsinthepharmaceuticalsciences.ADMET:absorption,distribution,metabolism,excretion,andtoxicity.

2.BasicconceptsofAIanditsscopeofapplication

AIwasfirstproposedattheDartmouthConferencein1956andwasdefinedasanalgorithmthatgivesmachinestheabilitytoreasonandperformfunctions[36].Fromperceptualmachinestosupportvectormachines(SVM)andartificialneuralnetworks(ANNs),thedevelopmentofAIhasgonethroughseveralupsanddowns,andiscurrentlyflourishingthankstothehardwaresupportthatisnowavailable.BothMLandDLfallunderthecategoryofAI;strictlyspeaking,DLcanbeplacedwithinthecategoryofML.However,ourdiscussionofMLinthisreviewonlyconcentratesontraditionalMLmethods,suchasrandomforest(RF)andSVMs.

2.1.Thebigdataera

Inthecurrentbigdataera,giganticamountsofbiologicalandclinicaldatahavelaidafoundationfortheapplicationofAIinthefieldofmedicalandpharmaceuticalresearch.AlthoughAIhasbeensuccessfullyandeffectivelyappliedinmultipleaspectsofthedrugR&Dprocess,thequantityandqualityofmedicaldatahavebecomeoneofthemainobstaclestothedevelopmentofAIinthepharmaceuticalsciences.Thusfar,pharmaceuticaldatabaseswithdetailedandstructuredbigdataproposedbymedicinalresearchersworldwideareplayingakeyroleinpromotingAIapplicationsinmedicalandpharmaceuticalresearch.

Forexample,thetherapeutictargetdatabase(TTD)includesthemostcomprehensiveinformationaboutknownand

4

Proteins

Genes

Drugs/drug

targets

Diseases

RCSB

PDB

PRIDE

UniProt

InterPro

VARIDT

Ensembl

UCSC

Genome

GEO

GenBank

RefSeq

EA

TTD

ChEMB

L

PubChe

m

DrugBank

DrugMAP

DTC

PHARO

S

TCGA

DisGenNET

ClinVar

OMIM

PDBcontains3Dstructuraldataoflargebiologicalmolecules,suchasproteinsandnucleicacids

PRIDEisapublicdatarepositoryforproteomics,includingproteinandpeptideidentifications,post-translationalmodificationsandsupportingspectralevidence

UniProtisaproteindatabasecontainingproteinsequences,functionalinformation,andanindexofresearchpapersInterProprovidesfunctionalanalysisofproteinsbyclassifyingthemintofamiliesandpredictingdomainsandimportantsitesVARIDTprovidescomprehensivedataonallaspectsofdrugtransporters’variability

Ensemblprovidescentralizedgenomicdataandpowerfulfunctionalitiessuchasgeneannotationandregulatoryfunctionpredictions

TheUCSCGenomebrowseroffersaccesstogenomesequencedatafromavarietyofvertebrateandinvertebratespeciesandmajormodelorganisms

TheGEOisadatabaserepositoryofhigh-throughputgeneexpressiondataandhybridizationarrays,chips,andmicroarraysGenBankisanannotatedcollectionofallpubliclyavailableDNAsequences

RefSeqprovidesseparateandlinkedrecordsforthegenomicDNA,genetranscripts,andcorrespondingproteinsformultipleorganisms

EAcollectsbaselinegeneexpressiondatafordifferentspeciesandcontexts,andcontainsdifferentialstudiesreportingexpressionchangesundertwodifferentconditions

TTDincludesthemostcomprehensiveinformationaboutknownandexploredtherapeuticproteinandnucleicacidtargetsChEMBLisamanuallycuratedlibraryofbioactivecompoundswithdrug-likeproperties

PubChemcoverscollectiveinformationonchemicalmoleculesandtheiractivitiesinresponsetobiologicalassaysDrugBankcombinescomprehensivedrugtargetinformationwithspecificdrugdata

DrugMAPprovidesacomprehensivelistofinteractingmoleculesfordrugs/drugcandidates,includinginformationondifferentialexpressionpatterns

DTCenablestheexplorationofbioactivitydata,theprocessingofnewbioactivitydata,anddatacurationinordertoimprovetheunderstandingofDTIs

PHAROSprovidesacomprehensive,integratedknowledgebaseforthedruggablegenome

TCGAhasover2.5petabytesofgenomic,epigenomic,transcriptomic,andproteomicdatarelatedtothecancergenomeDisGenNETcontainslarge,publiclyavailablecollectionsofgenesandvariantsassociatedwithhumandiseasesClinVarisapublicarchiveofreportsonrelationshipsamonghumanvariationsandphenotypes,withsupportingevidenceOMIMisanonlinecatalogofhumangenesandgeneticdisorders

[43]

[44]

[18]

[45]

[46,4

7]

[48]

[49]

[50]

[51]

[52]

[53]

[37]

[54]

[17]

[55]

[56]

[57]

[58]

[59]

[60]

[61]

[62]

exploredtherapeuticproteinandnucleicacidtargets,thetargeteddisease,pathwayinformation,andthecorrespondingdrugsdirectedateachofthesetargets.Itprovidesdetailedknowledgeofthefunctionsoftargets,aswellastheirsequence,3Dstructures,ligand-bindingproperties,relevantenzymes,andcorrespondingdruginformation[37].PubChem[17]providescollectiveinformationofchemicalmoleculesandtheiractivitiesinresponsetobiologicalassays,includingmolecularstructure,identifiers,physicochemicalproperties,patentinformation,andmoleculartoxicity.Somepopulardatabasesaimedatvariouspharmaceuticalissueshavebeenproposedandarefrequentlyused;theseplaysignificantrolesinpromotingtheapplicationofAIinmedicalandpharmaceuticalresearch[38–42].Summarizingvariouspopularpharmaceuticaldatabases,Table1[17,18,37,43–62]providesbriefinformationonpopularpharmaceuticaldatabases,categorizedintoprotein-related,gene-related,drug-related,anddisease-relateddatabases.

Table1

Pharmaceuticaldatabasesfocusingonproteins,genes,drugs/drugtargets,anddiseases.

FocusDatabaseDescriptionRefs.

PDB:proteindatabank;PRIDE:proteomicsidentificationdatabase;GEO:geneexpressionomnibus;EA:expressionatlas;DTC:drugtargetcommons;DTIs:drug–targetinteractions;TCGA:thecancergenomeatlas;OMIM:onlinemendelianinheritanceinman.

5

2.2.MLandDL

Unliketraditionalcomputerprogrammingcalculations,MLandDLcanlearnpotentialpatternsfromtheinputdatawithoutexplicitprogramming.Theyarenotlimitedbytheformatoftheinputdata,whichisbroadandcanincludetext,images,sound,andmore(alltypesofdatathatcanbeencoded)[63].Similartothehumanlearningmodel,MLandDLcangraduallyrecognizedifferentfeaturesofthedata,inferthepatternslyingwithin,andupdatetheirmodelparametersthroughcontinuousiterationsuntilavalidmodelisformed.

Accordingtotheapplicationscenarios,themodelscanbecategorizedintoregressionmodelsandclassificationmodels.Thedifferencebetweenclassificationandregressiontasksliesmainlyinwhetherthetypeofoutputvariableiscontinuousordiscrete.ChengandNg[64]appliedMLapproachestopredictthebiologicalactivityofper-andpolyfluorinatedalkylsubstances(PFAS)withanoutputofcontinuousvalues,andthisstudyisatypicalregressiontask.Hongetal.[65]builtaDLmodeltopredictwhetheraproteininabacteriumisoftheT4SEtype,withanoutputofdiscretevalues(e.g.,0/1),andthisstudyisatypicalclassificationtask.

Dependingonthetypeoflearningalgorithmrequiredtosolvetheproblem,modelsareconceptualizedintothreecategories:supervisedlearning,unsupervisedlearning,andreinforcementlearning.Supervisedlearningisalabeled-data-drivenprocessthattrainsamodelontherelationshipbetweeninputanditsprespecifiedoutputinordertopredictthecategoriesorcontinuousvariablesoffutureinput.Incomparison,unsupervisedmethodsareusedforidentifyingpatternsinunlabeleddatasetsandexploringadataset’spotentialstructurestoallowclusteringofthedataforfurtheranalysis.Inaddition,semi-supervisedlearningispart-waybetweensupervisedandunsupervisedlearning;itacceptsonlypartofthelabeleddatatodevelopatrainingmodelandisusedasapotentialsolutionforproblemsthatlackhigh-qualitydata[66].Reinforcementlearningperformsmodelconstructionthroughconstantinteractivelearning,relyingonpenaltiesforfailureorrewardsforsuccess.

2.3.IntroductiontodifferenttypesofML/DL-basedalgorithms

MLandDLmethodshavebeensuccessfullyappliedtosolverelevantbiomedicalproblems,withtheadoptedmodelingapproachvaryingfordifferentproblemsoreventhesameproblems.Forexample,smallmoleculesusedtobecharacterizedasengineeredfeaturesfordirectloadinginseveralMLmethodstopredicttheproperties;however,morerecently,GNNscanalsobeutilizedtodescribesmallmoleculesforpredictionsofproperties[67].Determiningthefunctionannotationsofproteinsisessentialfortheselectionofdruggableproteinsaspotentialtargets.Maxatetal.[68]conductedaconvolutionalneuralnetwork(CNN)toannotatethegeneontologyannotation(GOA)ofproteins.Nadavetal.[69]builtarecurrentneuralnetwork(RNN)forproteinfunctionannotations,andXiaetal.[70]combinedbothaCNNandRNNtopredictthegeneontology(GO)labelofproteins.

MLbuildsaspecialalgorithm—notaspecificalgorithm—thatfocusesonthefeaturesofthedataandtransformsthemintoknowledgethatmachinescanreadtoprovidehumanswithnewinsights.Variouscommonalgorithmsexistforresearcherstochoosefrom.ThenaïveBayes(NB)algorithmisaprobabilistic-basedclassifierbasedonBayes’theoremandindependenceassumptionsbetweenfeatures;itisasimpleandintuitivealgorithm[71].AnRFalgorithmconstructsasetofunrelateddecisiontreesthatformawholehierarchicalstructure;undermodelconstruction,eachtreeisindividuallyresponsibleforacorrespondingproblem[72].Thefinaldecisionisbasedonthemajorityvotesofthedecisiontrees.Modelsthatmakedecisionsbasedonthisapproacharealsocommonlyreferredtoasensemblemodels.eXtremegradientboosting(XGBOOST)isascalableMLalgorithmbasedongradientboosting,whichisalsoanensemblemodel[73].Multi-layerperceptron(MLP)canbeviewedasadirectedgraphconsistingofmultiplenodelayers,eachfullyconnectedtothenextlayer,sothatitmapsasetofinputvectorstoasetofoutputvectors.SVMisoneofthemostwidelyappliedMLalgorithms.Anoptimalhyperplaneisusedtoclassifysamples,whichareobtainedbymaximizingthemarginsbetweendifferentclassesinaspecificdimensionalspace,withthedimensionalitybeingdeterminedbythenumberoffeatures[74].K-nearestneighbor(KNN)isregardedas“lazylearning”thatclassifiesthesampleaccordingtoonlyafewneighboringsampleswhendistinguishingbetweencategories[75].Inadditiontotheabovemethods,severalotherMLmethodssuchasprincipalcomponentanalysis(PCA),partialleast-squares(PLS),lineardiscriminantanalysis(LDA),andlogisticregression(LR)havebeenappliedinbiomedicaldataprocesses[76,77].

DLispopularduetoitspowerfulgeneralizationandfeature-extractioncapabilities;itslearningandpredictionprocessisend-to-end.UnlikethetraditionalMLprocess(whichoftenconsistsofmultipleindependentmodules),DLobtainstheoutputdata(output-end)directlyfromtheinputdata(input-end)duringthemodeltrainingprocessandcontinuouslyadjustsandoptimizesthemodelbasedontheerrorbetweentheoutputandthetruevalue,untilitmeetstheexpectedresult.Adeepneuralnetwork(DNN)isafeed-forwardneuralnetworkconsistingofdenselyconnectedinput,hidden,andoutputlayers.Itachievesthefeaturelearningofinputdatabysimulatingnonlineartransformationsbetweenneurons,witheachlayerconsistingofvariousneurons[78].ACNNisafeed-forwardneuralnetworkthatconsistsofconvolutional(featureextraction)andpooling(dimensionalityreduction)layers.Theconvolutionalandpoolinglayershelptoextractalltheinformationinadatasetwithout

6

consumingtoomuchtimeandcomputationalresources[79].AnRNNisaclassofANNinwhichlinkednodesformadirectedorundirectedgraphalongatemporalsequence.AnRNNincludesafeedbackcomponentthatallowssignalsfromonelayertobefedbacktothepreviouslayer.Itistheonlyneuralnetworkwithinternalmemory,whichhelpstoaddressthedifficultyoflearningandstoringlong-terminformation[80].AGNNisaconnectivitymodelthatderivesthedependenciesinagraphbymeansofinformationtransferbetweennodesinthenetwork[81,82].AGNNupdatesthestateofanodeaccordingtoneighborsofthenodeatanydepthfromthenode;thisstateisabletorepresentthenodeinformation.TheneuralnetworkarchitecturesofthefournetworksdescribedaboveareshowninFig.2.

Anautoencoder(AE),whichconsistsofanencoderandadecoder,isusedtolearnefficientencodingsofinputdata.Theencoding,whichisgeneratedbyfeedinginputtotheencoder,regeneratestheinputbythedecoder.AnAEisusuallyusedfordatacompressionanddimensionalityreductionthroughtherepresentationmethods(i.e.,theencoding)ofasetofdata[83].Agenerativeadversarialnetwork(GAN)iscomposedoftwounderlyingneuralnetworks:ageneratorneuralnetworkandadiscriminatorneuralnetwork.Theformerisusedtogeneratecontent,whilethelatterisusedtodiscriminatethegeneratedcontent[84].Modelscanalsobeusedincombinationtosolveawiderrangeofproblems.Forexample,agraphconvolutionnetwork(GCN)extendsconvolutionaloperationsfromtraditionaldata(e.g.,images)tographdata[85].

Fig.2.SchematicnetworkarchitecturesforaDNN,GNN,CNN,andRNN.

Whenamodelfailstolearntheunderlyingpatternsindatafeatureseffectivelyandlosestheabilitytogeneralizetonewdata,suchaproblemiscalledmodelunderfitting[86].Incontrast,overfittingoccurswhenthemodelistrainingandnoisein

7

thedatafittedasarepresentativefeatureresultinginpoorpredictionsfornewdata[87].Comparedwithunderfitting,modeloverfittingismoredifficulttodealwith.Modelsoftenbecomeoverfittedduetobeingoverlycomplexorbecauseofanunderrepresentationofdata.Adatasetusedforamodelisoftendividedintoatrainingset,validationset,andtestset.Thesesetsarerespectivelyusedformodeltraining,modeladjustment,andmodelevaluation.Toputitsimply,amodelthatworksbadlyonboththetrainingandtestsetsisanunderfittedmodel,whileamodelthatworkswellonthetrainingsetbutbadlyonthetestsetisanoverfittedmodel.Typicalwaystosuppressoverfittingincluderegularization,dataaugmentation[88],dropout[89],earlystopping,ensemblelearning,andamongothermethods.

Researchersencounteredunderfittingandoverfittingproblems,usingonlyonemodeloftraditionalepidemicmodelsorMLmodels,whenpredictingthelong-termtrendsofthecoronavirusdisease2019(COVID-19)pandemic.Toaddresstheseissues,Sunetal.[90]proposedanewmodelcalleddynamic-susceptible-exposed-infective-quarantined(D-SEIQ).TheD-SEIQmodelcanaccuratelypredictthelong-termtrendsofCOVID-19outbreaksbyappropriatelymodifyingthesusceptible-exposed-infective-recovered(SEIR)modelandintegratingML-basedparameteroptimizationunderreasonableepidemiologyconstraints.

Differentmodelshavedifferentevaluationcriteria.Inregressionmodels,commonlyusedevaluationcriteriaincludemeansquarederror(MSE),rootMSE(RMSE),andRsquared.Inclassificationmodels,themorecommonlyusedcriteriaarerecall,precision,andF1score.Thereceiveroperatingcharacteristic(ROC)curveandprecision-recallcurve(PRC)arethemostcommonlyusedevaluationcriteriainclassificationmodels,withROCcurvestakingintoaccountbothpositiveandnegativecasestoassesstheoverallperformanceofthemodel,whilePRCsfocusmoreonpositivecases[91].

2.4.Abriefdescriptionofmoleculerepresentationasmodelinput

Overtime,theaccumulationofdataonsmallmoleculesandproteinshasresultedinanextremelylargedataresource.Databasesofmolecularsequences,structures,physicochemicalproperties,andsoforthhavebeencollectedandorganizedbydifferentorganizationsandcontainagreatdealofknowledgeandinformation.However,thedifferentsourcesandformatsofthedatamakeitdifficulttointegratethecorrelateddatafrommultipleheterogeneoussources.Therefore,itisparticularlyimportanttoadoptsuitablemethodstorepresentmoleculesinanappropriatewayandtominethecrucialinformationinthedataonmoleculesbymeansofAI[92].CurrentAIalgorithmsarehighlydependentonthequalityofthedata;thus,whenperformingmodelconstruction,itisnecessarytounifytheinputformatofmolecules,suchasbyrepresentingsmallmoleculesandproteinsasmodel-readablevectorsormatrices.

Atpresent,therepresentationofsmallmoleculesisgenerallydoneusingoneoffourmainapproaches.Thefirstapproachinvolvesknowledge-basedrepresentation.MoleculardescriptorsandmolecularfingerprintsbasedonhumanaprioriknowledgearewidelyusedinvariousMLorDLalgorithms[93].Thesecondapproachinvolvesdirectrepresentationbasedonimages.CNNshavenowbeenusedtolearnrulesfromtwo-dimensional(2D)digitalimages.A2DchemicaldigitalgridofamoleculecanbedirectlyusedasinputtoallowaCNNmodeltolearnthepropertiesofthemolecule[94].Thethirdapproachisstring-basedrepresentation.Forexample,atypicalcanonicalsimplifiedmolecular-inputline-entrysystem(SMILES)representssmallmoleculesintheformofstrings.Thus,CNNsandRNNscanbefurtherusedtolearnmolecularembeddingsfromthestringrepresentationsofchemicalstructures[95–97].Thefourthapproachinvolvesgraph-basedfeaturerepresentation.Representationmethodsbasedongraphconvolutionorgraphattentionhavebeenwidelyusedtoexplorethefeaturerepresentationofsmallmolecules.Inthesemethods,atomsandbondsareconsideredtobenodesandedges,respectively,whilenewmolecularrepresentationsareobtainedduringthecontinuousupdatingofinformationatindividualnodes.Graph-basedrepresentationshaveachievedoutstandingperformanceinavarietyofpharmaceuticallearningtasks[98,99].

Proteinrepresentationmethodscanbebasicallyclassifiedintofourcategories:representationbasedonintrinsicpropertiesofsequences,representationbasedonphy

温馨提示

  • 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
  • 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
  • 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
  • 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
  • 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
  • 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
  • 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

评论

0/150

提交评论