太空中的嵌入式深度学习:Qormino 的人工智能_第1页
太空中的嵌入式深度学习:Qormino 的人工智能_第2页
太空中的嵌入式深度学习:Qormino 的人工智能_第3页
太空中的嵌入式深度学习:Qormino 的人工智能_第4页
太空中的嵌入式深度学习:Qormino 的人工智能_第5页
已阅读5页,还剩4页未读 继续免费阅读

下载本文档

版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领

文档简介

EmbeddedDeepLearninginSpace:ArtificialIntelligencewithQormino®

Abstract

ArtificialIntelligence(AI)algorithmsareknowntobehighlydemandingintermsofcomputingresources.Thankstotheincreaseofcomputationalpowerofthelatestprocessingdevices,AIisalsobecomingpopularfortheSpaceindustryforvariousapplicationssuchasOn-boarddataprocessingforobservationsatellites,automatedguidanceofSpacecrafts,On-boarddecisionforcollisionprevention,Communicationsatellites,Fusionofdatasourcesforbetterpredictability,…

Untilrecently,Spaceindustrywasfacingthechallengetogetaccesstostate-of-the-artprocessingcomponentsthatwouldcomplywithSpacerequirements,i.e.highreliability,robustness,andradiationtolerance.

LedbytheGrenobleUniversitySpaceCentre(CSUG),theQlevErSatprojectleveragesthehighcomputingcapabilitiesofQorminoQLS1046-SpaceradiationtolerantprocessingmodulestorunAIalgorithmson-board,togetherwiththehighresolutionoftheimagestakenbytheEmeraldsensor.

ThiswhitepaperfirstpresentsthegeneralperformancesandfunctionalityoftheQLS1046-Spaceprocessor.Then,themainresultsfromthosebenchmarkingactivitiesaregiven,todemonstratethefeasibilitytouseQLS1046-SpacetorunembeddedAIinSpace.

Introduction

ArtificialIntelligence(AI)algorithmsareknowntobehighlydemandingintermsofcomputingresources.Thankstotheincreaseofcomputationalpowerofthelatestprocessingdevices,AIisbecomingpopularforgroundapplications.AInowcompeteswithtraditionaldataprocessinginanumberofapplications,suchasfacerecognition,autonomousdriving,orrobots.

TheSpaceindustrycanalsobenefitfromAIinvariousapplications:

On-boarddataprocessingforearlywarningstosituations,

Observationandmeteorologicalsatellites,whereon-boardprocessingallowstosendonlyrelevantandpre-processeddatatotheground,reducingdownlinkbandwidthrequirements,

AIcanimproveperformanceinautomatedguidanceofSpacecraftsincriticalmaneuverssuchasdockingorlanding,

On-boarddecisionallowsbettercollisionpreventionthankstoearlyreaction,andofferspossibilitiesofself-healthmonitoringandultimatelyautonomousself-reconfiguration,

Communicationsatellitescanbenefitfromsmartdataroutingandoptimizedantennapointingbasedonactualtrafficandweatherconditionstoincreasedatarateandminimizepowerconsumption,

Fusionofdatasourcesfromvariouskindofsensors,allowingtoseewhatisnotvisibletothe“humaneye”,includingon-boardanalysisoflargedatasetsindeepSpaceandSciencemissions.

Untilrecently,despitethiswiderangeofnewpossibilities,Spaceindustrywasfacingthechallengetogetaccesstostate-of-the-artprocessingcomponentsthatwouldcomplywithSpacerequirements,i.e.highreliability,robustness,andradiationtolerance.

LedbytheGrenobleUniversitySpaceCentre(CSUG),theQlevErSatisdevelopingananosatelliteusingartificialintelligencealgorithmstoobservetheEarthandmeetsocialchallengessuchasobservationofillegaldeforestation,monitoringofCO2emissionsorevaluationofdamagesafteranaturaldisaster.

Figure1:QlevErSatNanosatellite

ThissmartsatellitewillembedanEmerald16MPimagesensorandaQormino®QLS1046-Spaceprocessingmodule,bothnewradiation-tolerantandSpace-qualifiedcomponentsfromTeledynee2v.TheprojectleveragesthehighcomputingcapabilitiesofQLS1046-SpacetobeabletoruntheAIalgorithmson-board,togetherwiththehighresolutionoftheimagestakenbytheEmeraldsensor.

Figure3:Qormino®QLS1046-4GB Figure2:EMERALDSensor

Intheframeofthisproject,apartofthefeasibilitystudyaimedatverifyingthecomputingcapabilityoftheQorminoQLS1046-SpaceforAIalgorithms.ThiswhitepaperfirstpresentsthegeneralperformanceandfunctionalityofQLS1046-Space.Then,mainresultsobtainedinthosebenchmarkingactivitiesaregiven,demonstratingthefeasibilitytouseQLS1046-SpacetorunAIinSpace.

GeneralperformanceandfunctionalityofQormino®QLS1046-Space

QorminoisalineofprocessingmodulesfromTeledynee2vdedicatedtoSpaceandHigh-reliabilityapplications.ThosemodulescombineGHz-classmulticoreprocessors,withhighspeedDDR4memories,incompact44x26mmdimensions.Theycomeina0.8mmBGApackage,andaredesignedtorespondtoSWaP(Size,WeightandPower)constraints.Withbuilt-inDDR4buslayoutand“building-block”approach,designisfacilitatedwhileguaranteeingahighperformance.

QLS1046-SpaceistheQorminoversiondedicatedtoSpace.ItembedsaQuad-CoreArm®Cortex®-A72Microprocessorrunningupto1.8GHz,withECC-protectedL1andL2cachememoriesforreliablebehaviour.Itfeaturesarichsetofperipherals,includingintegratedpacketprocessingacceleration,highspeedseriallinkssupporting10GbEthernet,PCIe®Gen3,SATA3.0andUSB,aswellasanumberofgeneralpurposeinterfacessuchasSPI,I²C,andUART.Thecurrentversionintegrates4GBofDDR4withtransferspeedupto2.4GT/s,andaversionwith8GBisalsotargeted.

Figure4:ArchitectureofQLS1046-4GB-Space

Apartfromthepureperformanceaspect,thereasonforselectingthisdeviceisthatitisSpace-compliant.Boththeprocessorandthememoryareradiationtolerant:

SELfreeuptomorethan60MeV.cm²/mg

KnownSEU/SEFIcross-sectionsuptomorethan60MeV.cm²/mg

TID:100krad(Si)

Inaddition,QLS1046-Spaceanditscomponentsarequalified,manufactured,andscreenedfollowingNASAorECSSstandards.

Benchmark&Results

BenchmarkingactivitieswereperformedtoverifyinpracticethecomputingcapabilityofQLS1046-SpacetorunAIalgorithmsforSpaceapplications.ThefocusismainlyonAIforimageprocessing,sincetheQlevErSatprojecttargetsearthobservationusecases.Inthisstudy,onlyneuralnetworkswithdeeplearninghavebeentested.Classicalmachinelearningusuallyrequireslesscomputingresources,thusitwouldbeexpectedtogetevenbetterresultsinmachinelearning.

Inthisstudy,theperformancesofQLS1046-Spacewereevaluatedonthreedifferentaxes:

ThepurecomputingperformanceswereevaluatedintermsofGFLOPS(GigaFloatingPointOperationsPerSecond),sincethisisthetypicalwayofevaluatingthecomputingperformanceofadeviceinAIapplications.

Aninferencebenchmarkwasperformedtoverifythecapabilityofthedevicetoexecuteneuralnetworks.Severalclassicalneutralnetworkarchitectureshavebeentested.

Trainingperformancewasbrieflyassessed,toevaluatethepossibilityofapplyinglearningorfine-tuningonQLS1046-Space.

Benchmarksetup

TheperformanceassessmentwasrealizedwithaQLS1046-Spacedevelopmentkit,whichhasanumberofavailableinterfaces.TheoperatingsystemusedthroughoutthebenchmarkwasLinux(Ubuntu18.04).TheQSL1046-Spacedeviceinsidethedevelopmentkithad4GBofintegratedDDR4memory.Theversionwith8GBofDDR4memorywouldhavebeenmoreefficienttoexecuteAI,butitwasnotavailableatthetimeofthetesting.Inaddition,theprocessorwasrunningat1.6GHz,insteadof1.8GHzmaximumfrequency.ThismeansthattheresultspresentedinthiswhitepaperaresomewhatlimitedbytheamountofDDR4memoryavailableandtherunningfrequencyoftheprocessor.

Figure5:QLS1046-SpaceDevelopmentKit

Insomeofthefollowingbenchmarkresults,aregularcomputerwasusedasabasisforcomparisontorateQLS1046-Spaceperformance.ThiscomputerhadanIntel®Core™i7-9750Hprocessorrunningat

2.6GHzand32GBofDDR4.ItwasrunningLinux.ItisconsideredasagoodcomputertoperformAI,whichiswhyitisaconvenientreferenceinthefollowing.

Benchmarkresults

PerformancesofQLS1046-Spacewereevaluatedonthefollowingthreeaxes:

Figure6:Benchmarks

Purecomputingperformance

Forthepurecomputingperformanceevaluation,thebenchmark[1]wasused,whichconsistsinasmallandsimpletestsoftware.Intheresults,theperformanceofQLS1046-SpaceiscomparedtothatofthecomputerwiththeIntel®Core™i7-9750Hprocessor.Itshouldbenoticedthattheexecutionofthesoftwaredoesnottakeadvantagethehardwareacceleratorsoftheprocessors.ThisexplainsinparticularwhytheGFLOPSnumbersobtainedherearelowerthatwhatcanbefoundintheliteratureforthoseprocessors.Figure7presentsthepureresultsinGFLOPStocomparebothtargets.Figure8comparespowerefficiencysincethisisakeytopicinSpaceapplications.

Figure7:Summaryofthecomputingperformancecomparison.

Figure8:Powerefficiencyinquadcoreoperation.

Calculatedfromthermalpowercharacteristicsofbothdevices,45W@100°Cforthei7(Table5-2of[2]),14.6W@105°CforQLS1046-Space(Table8of[3]).

Itisobservedthatthegapbetweenthetwodevicesdependsonthenumberofcoresused,andwithhighernumberofcoresthedifferenceinperformancereduces.ThoseresultshighlightthatQLS1046-

Spaceoffersabouthalfofthecomputingcapabilitiesofthei7inthequad-coreconfiguration,whichisknowntobeagoodprocessortoperformAIonground.Hence,QLS1046-SpaceoffersafairamountofcomputingperformancetoperformAIinSpace.Inaddition,QLS1046-SpaceexhibitshigherpowerefficiencymakingitwellsuitedforSpacesystems.

Deeplearninginferencebenchmark

Inthisbenchmark,testsareperformedtoevaluatetheperformanceofQLS1046-Spaceininference,meaningwhenthedeviceusesaneuralnetworktoprocessanimage.Onlyclassicalneuralnetworksaretestedinthestudy,firstwithArmComputeLibrary[5],thenAI-Benchmark[4]onTensorFlow[7],andConvNet[6]onPyTorch[8].ItshouldbenoticedthatthetwomostpopularlibrariesusedforIAareTensorFlowandPyTorchproposedbyGoogleandFacebookrespectively,withbothlibrariessupportedbyArm[9].Thosenetworksarepre-trainedtoidentifyobjectsinpicturesandarewidelyusedintheexistingobjectclassifierssuchasr-cnn,fast-rcnn,fasterr-cnn[10]orCenterNet[11].However,TensorFlowandPyTorchlibrariesareevolvingveryquickly,andthisisthereasonforevaluatingfirsttheperformancewithArmComputeLibrary,whichisconsideredmorestable.

ArmComputeLibrary

Inthisbenchmark,ArmComputeLibrary[5]isusedtorundifferentclassicalneuralnetworks.TheresultsobtainedonQLS1046-SpaceareshownintheTable1:

Network

Executiontime[ms]

Numberofoperationsforaninference[MFLOP]

Computingperformances[GFLOPS]

Singlecore

Quadcore

Single

core

Quadcore

Alexnet

153

74

727

5

10

Googlenet

286

109

1500

5

14

Inceptionv3

848

314

6000

7

19

Inceptionv4

1870

655

13000

7

20

Mobilenet

118

44

570

5

13

Resnet50

501

206

4000

8

19

Squeezenet

145

64

360

2

6

Vgg16

1090

418

16000

15

38

Yolov3

6540

2500

66000

10

26

Table1:PerformanceofQLS1046-SpacewithArmcomputelibrary.

Thoseresultsconfirmthatitispossibletoperformon-boardimageclassificationusingQLS1046-Space,withthiskindofcommonclassifiers,andwithreasonableexecutiontime.ThoseresultsareespeciallyinterestingconsideringthatArmcomputelibraryisoneofthemajorframeworksforAI.

AI-Benchmark

AI-Benchmark[4]instantiatesbackbonesintheTensorFlowformat,whichareverycommonneuralnetworksoriginallycreatedforimageclassification.TheresultsofthebenchmarkfordifferentneuralnetworksaregivenintheTable2:

Backbone

Picture

size

Execution

time[ms]

Variability

[ms]

Description

VGG16[9]

224x224

1320

7

NetworktrainedonImageNet[12]to

classify1000objects.

VGG19[9]

512x512

13562

144

NetworktrainedonImageNet[12]to

classify1000objects.

ResNet-V2-50

346x346

868

5

Classifierbasedonresidualneural

network[13]

ResNet-V2-152

256x256

1538

18

Classifierbasedonresidualneural

network

Table2:AI-BenchmarkresultsonQLS1046-Space.

TheresultsshowthatQLS1046-Spaceallowstoperformanon-boardimageclassificationwithclassicalneuralnetworksinabout1s.ThisimpliesanoptimizedmemorymanagementwiththeuseofFP16type,andwithpicturesizesuitablewiththememoryavailableof4GB.ItisnoticedthatVGG19[9]isaround10timeslongertoexecutethanothertests,whichmaybeduetocachememoriesconfigurationandDDR4sizelimitation.

Basedontheresults,QLS1046-Spaceobtainsascoreof103.Neuralnetworksareknowntorequirelargeamountsofmemory,hencetheperformanceobtainedhereislimitedbytheDDR4sizeof4GBonthetestedversion.Muchhigherrankingisexpectedwithan8GBversion.

Convnet

Inthisbenchmark,ConvNet[6]onPyTorchistestedonQLS1046-Space.PytorchtendstobeusedmoreandmoreoftenoverTensorFlow.PyTorchwasoriginallymorecomplextousebutwasmoreflexible.FromPyTorchversion1.8,animportantreductionincomplexityisexpectedtobenefittoQLS1046-Space.ItshouldalsobenoticedthatPyTorchisnowcanhandletoolssuchasSLURM[14]onpytorch-lightning[15].ConvnetbenchmarkresultsonPyTorcharegivenintheTable3:

Network

Executiontime[ms]

QLS1046-Space@1.6GHz

Intel®Core™i7-9750H@2.6GHz

Alexnet

187

1.72

VGG11

764

4.28

ResNet50

578

7.29

Squeezenet1_0

328

2.28

Densenet121

1283

17.93

Mobilenet_v2

2337

6.38

Shufflenet

1278

8.49

Unet

1263

4.98

Table3:ConvnetresultsonQLS1046-Space.

Thebenchmarkshowsthatthei7isperformingmuchfasterthanQLS1046-Space,whichislimitedagainbythesizeofmemoryavailable.Despitethegapinperformance,itisstillconsideredthattheperformancelevelofferedbyQLS1046-Spaceisacceptabletoimplementon-boardAIprocessing.

Deeplearningtrainingperformance

TrainingperformanceusingQLS1046-SpacewasquicklytestedonConvnetwithTensorFlow.Itwasnotextensivelytestedsincemostup-to-datebackpropagation[16]benchmarksrequireatleast8GBofRAMmemory.Table4showsthecomparisonofthetrainingtimeforonesampleonResNet50betweenQLS1046-SpaceandtheIntel®i7.

Network

Trainingtimeforonesample[ms]

OnQLS1046-Space

OnIntel®Core™i7-9750H

ResNet50

3782

20

Table4:Comparisonoftrainingperformance.

ThisresultclearlyshowsthepenaltyofthelackofRAMmemoryonthecurrentversionofQLS1046-Spacefortrainingontraditionalimageclassifiers.Itshouldbenoticedthatacompletetrainingusuallyrequireshundredsofsamples.Thisresulthastobemitigatedduetothefactthatimageclassifiersareknowntobehighlydemandingincomputingresources.Sinceitwillbetime-consumingtoperformacompletetrainingonQLS1046-Space,analternativethatcanbeconsideredistoperformfine-tuning

[17]on-board.

TrainingsmallconvolutionalneuralnetworksforsimpledetectionusecasesseemsfeasiblewithQLS1046-Space,aswellasdeeplearningforprocessingtime-seriesor1-Dsignals.Intermsoftrainingcapabilitiesonimages,QLS1046-Spacewouldbemoreefficientinclassicalmachinelearning,butthosemodelsaremorecomplextobuild.

Discussion

QLS1046-SpaceoffersadecentamountacomputingcapabilityallowingtorundeeplearningAIforimageprocessinginSpace.Thedeviceisnotaspowerfulastailored-madesolutionsthatareavailableforAIinferenceingroundapplications,butitisthemostpowerfulSpace-qualifiedCPUavailableonthemarket.Intermsofpurecomputingcapabilities,itoffersperformanceinthesameorderofmagnitudeasanIntel®Core™i7-9750H.FromtheAIperformancepointofview,themaindrawbackofthecurrentversionisthe4GBmemory,whichrequiresanoptimizedmemorymanagementtorunAIforimageprocessing.Onnextversionswith8GBDDR4memoryormore,AIperformancewouldbesignificantlyincreased,andwouldreducetheburdenofoptimizedmemorymanagement.

PerformanceobtainedinthepreviousbenchmarkswasevaluatedwithclassicaldeepneuralnetworkswithouttakingadvantageofthespecificQLS1046-Spacearchitecture.DifferentAItopologiesaremoreoptimizedtorunonembeddedtargets,whichwouldbringabetterefficiencyoftheAIrunningonQLS1046-Space.ApartfromAIcomputingperformance,thestudyshowsthatQLS1046-SpaceexhibitsgoodpowerefficiencymakingitwellsuitedforSpacesystemswhereelectricalpowerislimitedandpowerdissipationisanissue.Fromtheelectronicarchitecturepointofview,itmightberelevanttoaddanFPGAasacompanion-chipforQLS1046-Space,inwhichcasetheFPGAcouldtakecareefficientlyofthepre-processing,andQLS1046-Spacewouldthenperformtheheavywork.

Inthisstudy,theprimaryfocuswasondeeplearningAIforimageprocessing,whichisconsideredoneofthemostdemandingapplicationintermsofcomputingresources.Forinstance,processingof1-Dtimeseriesismuchlessdemandingthatimageprocessing.Hence,theoutcomeofthestudyisthatQLS1046-Spacewouldalso

温馨提示

  • 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
  • 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
  • 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
  • 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
  • 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
  • 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
  • 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

评论

0/150

提交评论