SDS(软件定义存储)-存储虚拟化介绍-何雨_第1页
SDS(软件定义存储)-存储虚拟化介绍-何雨_第2页
SDS(软件定义存储)-存储虚拟化介绍-何雨_第3页
SDS(软件定义存储)-存储虚拟化介绍-何雨_第4页
SDS(软件定义存储)-存储虚拟化介绍-何雨_第5页
已阅读5页,还剩109页未读 继续免费阅读

下载本文档

版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领

文档简介

何雨京东商城虚拟化技术总监StorageVirtualizationAgendaOverviewIntroductionWhattobevirtualizedWheretobevirtualizedHowtobevirtualizedCasestudyOnlinuxsystemRAIDLVMNFSIndistributedsystemVastskyLustreCephHDFSOverviewIntroductionWhattobevirtualized?Block,FilesystemWheretobevirtualized?Host-based,Network-based,Storage-basedHowtobevirtualized?In-band,Out-of-bandStorageVirtualizationIntroductionWhattobevirtualizedWheretobevirtualizedHowtobevirtualizedCasestudyIntroductionCommonstoragearchitecture:DAS-DirectAttachedStorageStoragedevicewasdirectlyattachedtoaserverorworkstation,withoutastoragenetworkinbetween.NAS-NetworkAttachedStorageFile-levelcomputerdatastorageconnectedtoacomputernetworkprovidingdataaccesstoheterogeneousclients.SAN-StorageAreaNetworkAttachremotestoragedevicestoserversinsuchawaythatthedevicesappearaslocallyattachedtotheoperatingsystem.IntroductionDesirablepropertiesofstoragevirtualization:ManageabilityStorageresourceshouldbeeasilyconfiguredanddeployed.AvailabilityStoragehardwarefailuresshouldnotaffecttheapplication.ScalabilityStorageresourcecaneasilyscaleupanddown.SecurityStorageresourceshouldbesecurelyisolated.IntroductionStorageconceptandtechniqueStorageresourcemappingtableRedundantdataMulti-pathDatasharingTieringConceptandTechniqueStorageresourcemappingtableMaintaintablestomapstorageresourcetotarget.Dynamicmodifytableentriesforthinprovisioning.Usetabletoisolatedifferentstorageaddressspace.ConceptandTechniqueRedundantdataMaintainreplicastoprovidehighavailability.UseRAIDtechniquetoimproveperformanceandavailability.ConceptandTechniqueMulti-pathAfault-toleranceandperformance

enhancementtechnique.Thereismorethanonephysicalpath

betweenthehostandstoragedevices

throughthebuses,controllers,

switches,andbridgedevices

connectingthem.ConceptandTechniqueDatasharingUsedatade-duplicationtechniquetoeliminateduplicateddata.SaveandimprovetheusageofstoragespaceConceptandTechniqueTieringAutomaticmigratedataacrossstorageresourceswithdifferentpropertiesaccordingtothesignificanceoraccessfrequencyofdata.Example:iMacfusiondriveStoragePoliciesAccessGroupStorageVirtualizationIntroductionWhattobevirtualizedWheretobevirtualizedHowtobevirtualizedCasestudyWhatToBeVirtualizedLayerscanbevirtualizedFilesystemProvidecompatiblesystemcallinterfacetouserspaceapplications.BlockdeviceProvidecompatibleblockdeviceinterfacetofilesystem.ThroughtheinterfacesuchasSCSI,SAS,ATA,SATA,etc.KernelSpaceUserSpaceApplicationSystemcallinterfaceFileSystemBlockinterfaceDevicedriverStorageDeviceFileSystemLevelDataandFilesWhatisdata?Dataisinformationthathasbeenconvertedtoamachine-readable,digitalbinaryformat.Controlinformationindicateshowdatashouldbeprocessed.Applicationsmayembedcontrolinformationinuserdataforformattingorpresentation.Dataanditsassociatedcontrolinformationisorganizedintodiscreteunitsasfilesorrecords.Whatisfile?Filesarethecommoncontainersforuserdata,applicationcode,andoperatingsystemexecutablesandparameters.FileSystemLevelAboutthefilesMetadataThecontrolinformationforfilemanagementisknownasmetadata.Filemetadataincludesfileattributesandpointerstothelocationoffiledatacontent.Filemetadatamaybesegregatedfromafile'sdatacontent.Metadataonfileownershipandpermissionsisusedinfileaccess.Filetimestampmetadatafacilitatesautomatedprocessessuchasbackupandlifecyclemanagement.DifferentfilesystemsInUnixsystems,filemetadataiscontainedinthei-nodestructure.InWindowssystems,filemetadataiscontainedinrecordsoffileattributes.FileSystemLevelFilesystemWhatisfilesystem?Afilesystemisasoftwarelayerresponsiblefororganizingandpolicingthecreation,modification,anddeletionoffiles.Filesystemsprovideahierarchicalorganizationoffilesintodirectoriesandsubdirectories.TheB-treealgorithmfacilitatesmorerapidsearchandretrievaloffilesbyname.Filesystemintegrityismaintainedthroughduplicationofmastertables,changelogs,andimmediatewritesofffilechanges.DifferentfilesystemsInUnix,thesuperblockcontainsinformationonthecurrentstateofthefilesystemanditsresources.InWindowsNTFS,themasterfiletablecontainsinformationonallfileentriesandstatus.FileSystemLevelFilesystemlevelvirtualizationFilesystemmaintainsmetadata

(i-node)ofeachfile.Translatefileaccessrequeststounderliningfilesystem.Sometimedividelargefileintosmallsub-files(chunks)forparallelaccess,whichimprovestheperformanceBlockDeviceLevelBlockleveldataThefilesystemblockTheatomicunitoffilesystemmanagementisthefilesystemblock.Afile'sdatamayspanmultiplefilesystemblocks.Afilesystemblockiscomposedofaconsecutiverangeofdiskblockaddresses.DataindiskDiskdrivesreadandwritedatatomediathroughcylinder,head,andsectorgeometry.Microcodeonadisktranslatesbetweendiskblocknumbersandcylinder/head/sectorlocations.Thistranslationisanelementaryformofvirtualization.BlockDeviceLevelBlockdeviceinterfaceSCSI(SmallComputerSystemInterface)TheexchangeofdatablocksbetweenthehostsystemandstorageisgovernedbytheSCSIprotocol.TheSCSIprotocolisimplementedinaclient/servermodel.TheSCSIprotocolisresponsibleforblockexchangebutdoesnotdefinehowdatablockswillbeplacedondisk.MultipleinstancesofSCSIclient/serversessionsmayrunconcurrentlybetweenaserverandstorage.BlockDeviceLevelLogicalunitandLogicalvolumeLogicalunitTheSCSIcommandprocessingentitywithinthestoragetargetrepresentsalogicalunit(LU)andisassignedalogicalunitnumber(LUN)foridentificationbythehostplatform.LUNassignmentcanbemanipulatedthroughLUNmapping,whichsubstitutesvirtualLUNnumbersforactualones.LogicalvolumeAvolumerepresentsthestoragecapacityofoneormorediskdrives.LogicalvolumemanagementmaysitbetweenthefilesystemandthedevicedriversthatcontrolsystemI/O.Volumemanagementisresponsibleforcreatingandmaintainingmetadataaboutstoragecapacity.Volumesareanarchetypalformofstoragevirtualization.BlockDeviceLevelDatablocklevelvirtualizationLUN&LBAAsingleblockofinformationisaddressedusingalogicalunitidentifier(LUN)andanoffsetwithinthatLUN,whichknownasaLogicalBlockAddress(LBA).ApplyaddressspaceremappingTheaddressspacemappingisbetweenalogicaldiskandalogicalunitpresentedbyoneormorestoragecontrollers.StorageVirtualizationIntroductionWhattobevirtualizedWheretobevirtualizedHowtobevirtualizedCasestudyWhereToBeVirtualizedStorageinterconnectionThepathtostorageThestorageinterconnectionprovidesthedatapathbetweenserversandstorage.Thestorageinterconnectioniscomposedofbothhardwareandsoftwarecomponents.OperatingsystemsprovidedriversforI/Otostorageassets.Storageconnectivityforhostsisprovidedbyhostbusadapters(HBAs)ornetworkinterfacecards(NICs).WhereToBeVirtualizedStorageinterconnectionprotocolFibreChannelUsuallyforhighperformancerequirements.Supportspoint-to-point,arbitratedloop,andfabricinterconnects.Devicediscoveryisprovidedbythesimplenameserver(SNS).FibreChannelfabricsareself-configuringviafabricprotocols.iSCSI(internetSCSI)Formoderateperformancerequirements.EncapsulatesSCSIcommands,statusanddatainTCP/IP.DevicediscoverybytheInternetStorageNameService(iSNS).iSCSIserverscanbeintegratedintoFibreChannelSANsthroughIPstoragerouters.WhereToBeVirtualizedAbstractionofphysicalstoragePhysicaltovirtualThecylinder,headandsectorgeometryofindividualdisksisvirtualizedintologicalblockaddresses(LBAs).Forstoragenetworks,thephysicalstoragesystemisidentifiedbyanetworkaddress/LUNpair.CombiningRAIDandJBODassetstocreateavirtualizedmirrormustaccommodateperformancedifferences.MetadataintegrityStoragemetadataintegrityrequiresredundancyforfailoverorloadbalancing.Virtualizationintelligencemayneedtointerfacewithupperlayerapplicationstoensuredataconsistency.WhereToBeVirtualizedDifferentapproaches:Host-basedapproachImplementedasasoftwarerunningonhostsystems.Network-basedapproachImplementedonnetworkdevices.Storage-basedapproachImplementedonstoragetargetsubsystem.Host-basedVirtualizationHost-basedapproachFilelevelRunvirtualizedfilesystemonthehosttomapfilesintodatablocks,whichdistributedamongseveralstoragedevices.BlocklevelRunlogicalvolumemanagementsoftwareonthehosttointerceptI/Orequestsandredirectthemtostoragedevices.ProvideservicesSoftwareRAIDSub-file

1Sub-file

2Sub-file

3Block1Block2Block1Block2Block1Host-basedVirtualizationImportantissuesStoragemetadataserversStoragemetadatamaybesharedbymultipleservers.SharedmetadataenablesaSANfilesystemviewformultipleservers.Providesvirtualtoreallogicalblockaddressmappingforclient.AdistributedSANfilesystemrequiresfilelockingmechanismstopreservedataintegrity.Host-basedstorageAPIsMaybeimplementedbytheoperatingsystemtoprovideacommoninterfacetodisparatevirtualizedresources.Microsoft'svirtualdiskservice(VDS)providesamanagementinterfacefordynamicgenerationofvirtualizedstorage.Host-basedVirtualizationAtypicalexample:LVMSoftwarelayerbetweenthefilesystemandthediskdriver.ExecutedbythehostCPU.Lackhardware-assistforfunctionssuchassoftwareRAID.Independencefromvendor-specificstoragearchitectures.Dynamiccapacityallocationtoexpandorshrinkvolumes.Supportalternatepathingforhighavailability.Host-basedVirtualizationHost-basedimplementationProsNoadditionalhardwareorinfrastructurerequirementsSimpletodesignandimplementImprovestorageutilizationConsStorageutilizationoptimizedonlyonaperhostbaseSoftwareimplementationisdependenttoeachoperatingsystemConsumeCPUclockcycleforvirtualizationExamplesLVM,NFSNetwork-basedVirtualizationNetwork-basedapproachFilelevelSeldomimplementfilelevelvirtualizationonnetworkdevice.BlocklevelRunsoftwareondedicatedappliancesorintelligentswitchesandrouters.ProvideservicesMulti-pathStoragepoolingBlock1Block2Block1Block2Block1Network-basedVirtualizationRequirementsofstoragenetworkIntelligentservicesLogonservicesSimplenameserverChangenotificationNetworkaddressassignmentZoningFabricswitchshouldprovideConnectivityforallstoragetransactionsInteroperabilitybetweendisparateservers,

operatingsystems,andtargetdevicesNetwork-basedVirtualizationTechniquesforfabricswitchvirtualizationHostedondepartmentalswitchesAPCengineprovisionedasanoptionblade.DatacenterdirectorsShouldbeabletopreservethefiveninesavailabilitycharacteristicofdirector-classswitches.DedicatedvirtualizationASICsprovide

high-performanceframeprocessing

andblockaddressmapping.Interoperabilitybetween

differentimplementations

willbecomeapriority.Network-basedVirtualizationInteroperabilityissueFAIS(FabricApplicationInterfaceStandard)DefineasetofstandardAPIstointegrateapplicationsandswitches.FAISseparatescontrolinformationanddatapaths.Thecontrolpathprocessor(CPP)supportstheFAISAPIsandupperlayerstoragevirtualizationapplication.Thedatapathcontroller(DPC)executesthevirtualizedSCSII/OsunderthemanagementofoneormoreCPPsNetwork-basedVirtualizationNetwork-basedimplementationProsTrueheterogeneousstoragevirtualizationNoneedformodificationofhostorstoragesystemMulti-pathtechniqueimprovetheaccessperformanceConsComplexinteroperabilitymatrices-limitedbyvendorssupportDifficulttoimplementfastmetadataupdatesinswitchdeviceUsuallyrequiretobuildspecificnetworkequipments(e.g.,FibreChannel)ExamplesIBMSVC(SANVolumeController),EMCInvistaStorage-basedVirtualizationStorage-basedapproachFilelevelRunsoftwareonstoragedevicetoprovidefilebaseddatastorageservicestohostthroughnetwork.BlocklevelEmbedsthetechnologyinthetargetstoragedevices.ProvideservicesStoragepoolingReplicationandRAIDDatasharingandtieringSub-file

1Sub-file

2Sub-file

3Sub-file

1.bakSub-file

2.bakBlock1Block1Block1Block1Block1Block1ReplicaReplicaReplicaStorage-basedVirtualizationArray-basedvirtualizationStoragecontrollerProvidebasicdiskvirtualizationintheformofRAIDmanagement,mirroring,andLUNmappingormasking.AllocateasingleLUNtomultipleservers.OfferFibreChannel,iSCSI,

andSCSIprotocol.CachememoryEnhanceperformance.StorageassetscoordinationCoordinationbetween

multiplestoragesystems

isnecessarytoensurehigh

availability.Storage-basedVirtualizationDatareplicationArray-baseddatareplicationReferredtoasdisk-to-diskreplication.Requiresthatastoragecontrollerfunctionconcurrentlyasbothaninitiatorandtarget.Synchronousvs.AsynchronousSynchronousdatareplicationensuresthatawriteoperationtoasecondarydiskarrayiscompletedbeforetheprimaryarrayacknowledgestaskcompletiontotheserver.Asynchronousdatareplicationprovideswritecompletionbytheprimaryarray,althoughthetransactionmaystillbependingtothesecondaryarray.Storage-basedVirtualizationSynchronousAsynchronousTopreserveperformance,synchronousdatareplicationislimitedtometropolitandistancesAsynchronousdatareplicationislargelyimmunetotransmissionlatencyStorage-basedVirtualizationOtherfeaturesPoint-in-timecopy(snapshot)Providepoint-in-timecopiesofanentirestoragevolume.Snapshotcopiesmaybewrittentosecondarystoragearrays.Provideanefficientmeanstoquicklyrecoveraknowngoodvolumestateintheeventofdatafromthehost.DistributedmodularvirtualizationDecouplingstoragecontrollerlogicfromphysicaldiskbanksprovidesflexibilityforsupportingheterogeneousdiskassetsandfacilitatesdistributedvirtualizationintelligence.Accommodatesclassofstorageservicesanddatalifecyclemanagement.Storage-basedVirtualizationDecouplingstoragecontrollerintelligenceandvirtualizationenginesfromphysicaldiskbanksfacilitatesmulti-protocolblockdataaccessandaccommodationofabroadrangeofdiskarchitectures.DistributedModularVirtualizationStorage-basedVirtualizationStorage-basedimplementationProsProvidemostofthebenefitsofstoragevirtualizationReduceadditionallatencytoindividualIOConsStorageutilizationoptimizedonlyacrosstheconnectedcontrollersReplicationanddatamigrationonlypossibleacrosstheconnectedcontrollersandthesamevendorsdevicesExamplesDiskarrayproductsStorageVirtualizationIntroductionWhattobevirtualizedWheretobevirtualizedHowtobevirtualizedIn-bandVirtualizationImplementationmethods:In-bandAlsoknownassymmetric,virtualizationdevicesactuallysitinthedatapathbetweenthehostandstorage.HostsperformIOtothevirtualizeddeviceandneverinteractwiththeactualstoragedevice.ProsEasytoimplementConsBadscalability&BottleneckDataDataDataControl

MessageControl

MessageControl

MessageControl

MessageControl

MessageControl

MessageOut-of-bandVirtualizationImplementationmethods:Out-of-bandAlsoknownasasymmetric,virtualizationdevicesaresometimescalledmetadataservers.Requireadditionalsoftwareinthehostwhichknowsthefirstrequestlocationoftheactualdata.ProsScalability&PerformanceConsHardtoimplementControl

MessageControl

MessageControl

MessageControl

MessageControl

MessageControl

MessageDataDataDataOtherVirtualizationServicesInavirtualizedstoragepool,virtualassetsmaybedynamicallyresizedandallocatedtoserversbydrawingonthetotalstoragecapacityoftheSANPoolingHeterogeneous

StorageAssetsHeterogeneousMirroringHeterogeneousmirroringoffersmoreflexibleoptionsthanconventionalmirroring,includingthree-waymirroringwithinstoragecapacitycarvedfromdifferentstoragesystemsOtherVirtualizationServicesHeterogeneousdatareplicationenablesduplicationofstoragedatabetweenotherwiseincompatiblestoragesystems.HeterogeneousDataReplicationSummaryStoragevirtualizationtechnique:VirtualizationlayerFilelevelandblocklevelVirtualizationlocationHost,networkandstoragebaseVirtualizationmethodIn-bandandout-of-bandStoragevirtualizationservicesStoragepoolingandsharingDatareplicationandmirroringSnapshotandmulti-pathingStorageVirtualizationIntroductionWhattobevirtualizedWheretobevirtualizedHowtobevirtualizedCasestudyStorageVirtualization

onlinuxsystemCase-study,virtualizationonlinuxsystemBlock-basedRedundantArrayofIndependentDisks(RAID)LogicalVolumeManagement(LVM)File-basedNetworkFileSystem(NFS)RAIDRAID(redundantarrayofindependentdisks)Originally:redundantarrayofinexpensivedisksRAIDschemesprovidedifferentbalancebetweenthekeygoals:ReliabilityAvailabilityPerformanceCapacityRAIDlevelThemostused:RAID0block-levelstripingwithoutparityormirroringRAID1mirroringwithoutparityorstripingRAID1+0referredtoasRAID1+0,mirroringandstripingRAID2RAID3RAID4RAID5block-levelstripingwithdistributedparityRAID5+0referredtoasRAID5+0,distributedparityandstripingRAID6RAID0RAID

0:Block-levelstripingwithoutparityormirroringIthasno(orzero)redundancy.ItprovidesimprovedperformanceandadditionalstorageIthasnofaulttolerance.Anydrivefailuredestroysthearray,andthelikelihoodoffailureincreaseswithmoredrivesinthearray.figurefrom:RAID1RAID

1:MirroringwithoutparityorstripingDataiswrittenidenticallytotwodrives,therebyproducinga"mirroredset";Areadrequestisservicedbyoneofthetwodrivescontainingwithleastseektimeplusrotationallatency.Awriterequestupdatesthestripesofbothdrives.Thewriteperformancedependsontheslowerofthetwo.Atleasttwodrivesarerequiredtoconstitutesuchanarray.Thearraycontinuestooperateaslongasatleastonedriveisfunctioning.Spaceefficiency1/NN=2FaulttoleranceN–1N=2figurefrom:RAID5RAID5:Block-levelstripingwithdistributedparitydistributesparityondifferentdiskrequiresatleast3disksSpaceefficiency1−1/NFaulttolerance1figurefrom:RAID1+0/RAID5+0RAID1+0RAID1(mirror)+StripeRAID5+0RAID5(parity)+Stripefigurefrom:RAIDLevelComparisonRAIDlevelReliabilityWritePerformanceSpaceefficiencyRAID0×○2/2RAID1◎○1/2RAID1+0◎○2/4RAID5○△2/3RAID5+0○○4/6tablefrom:LogicalVolumeManagementLVMarchitectureLogicalVolumeManagementLVMprojectisimplementedintwocomponents:InuserspaceSomemanagementutilitiesandconfigurationtools

Ex.lvm,dmsetupProgramminginterfacewithawell-designedlibrary

Ex.libdevmapper.hInkernelspaceImplementdevicemapperframeworkProvidedifferentmappeddevicetargets

Ex.linear,stripe,mirror…etc.LogicalVolumeManagementToolsandutilitiesareinuserspace.LogicalVolumeManagementlvmCommand-linetoolsforLVM2.logicalvolume(lv)operationsvolumegroup(vg)operationsphysicalvolume(pv)operationsLimitedcontrollabilityOnlycancreatelogicalvolumewithsimple

mappingmechanisms.Donotallowcrossmachinemappings.

dmsetupLimitationsStillcannotprovidecrossmachinemappings.LogicalVolumeManagementdmsetuplowlevellogicalvolumemanagementOperatecreate,delete,suspendandresume…etcWorkwithmappingtablefileLogicalVolumeManagementFilesystemwillbuildupondevicemapperframeworkbymeansofsystemcalls.LogicalVolumeManagementFilesysteminoperatingsystemwillinvokeasetofblockdevicesystemcalls.DeviceMapperframework

reloadoperationfunctionsLogicalVolumeManagementFilesystemcanbealsoimplementedintheuserspaceonly.LogicalVolumeManagementDevicemapperframeworkimplementsaLinuxkerneldriverfordifferentmappings.LogicalVolumeManagementDevicemapperframeworkdefinesasetoftargetdevicemappinginterfaces.dm_ctr_fnctrInitiatorofeachnewlycreatedmappeddevicedm_dtr_fndtrDestructorofeachremovingmappeddevicedm_map_fnmapSetupthemappingrelationsdm_ioctl_fnioctlExactlyperformsystemIOinvocations…etc.LogicalVolumeManagementDevelopanewmappeddevicetargetandadditintodevicemapperframework.ImprovescalabilityNetworkFileSystemNFSarchitectureNetworkFileSystemWhatisNFS?NFSisaPOSIX-compliantdistributedfilesystemWorkdistributedlyasserver-clientmodelNFSbuildsontheRemoteProcedureCall(RPC)system.TheNetworkFileSystemisanopenstandarddefinedinRFCs.Somefeatures:SharedPOSIXfilesystemCommonmoduleinlinuxkernelwellperformanceNetworkFileSystemDynamicPortanditshandleInNFSv3,servicelistensonrandomtcpport.NFSuseRPC(RemoteProcedureCall)togettheportofservice.NetworkFileSystemConsistencyandconcurrencyinNFSLockdoffersawritelocktohandleconcurrentupdate.Statdhandlestheconsistencybetweenserverandclients.StorageVirtualization

indistributedsystemCase-study,virtualizationindistributedsystemBlock-basedVastSkyFile-basedLustreObject-basedCephHDFSVastSkyOverviewVastSkyisalinux-basedclusterstoragesystem,whichprovideslogicalvolumestousersbyaggregatingdisksoveranetwork.ThreekindsofserversstoragemanagerMaintainingadatabasewhichdescribesphysicalandlogicalresourcesinasystem.e.g.createandattachlogicalvolumes.headservers

RunninguserapplicationsorvirtualmachineswhichactuallyuseVastSkylogicalvolumes.storageserversStorageservershavephysicaldiskswhichareusedtostoreuserdata.Theyareexportedoverthenetworkandusedtoprovidelogicalvolumesonheadservers.(iSCSI)VastSkyVastSkyArchitectureStorageManagerStoragePoolXML-RPCXML-RPCiSCSIrequestVastSkyLogicalVolumeasetofseveralmirroreddisksseveralphysicaldiskchunksondifferentserversStorageServer2StorageServer1StorageServer3StorageServer4StoragePoolLogicalVolumeThereare3mirroreddisksandallofthemaredistributedin3differentservers.VastSkyRedundancyVastSkymirrorsuserdatatothreestorageserversbydefaultandallofthemareupdatedsynchronously.VastSkycanbeconfiguredtousetwonetworks(e.g.twoindependentethernetsegments)forredundancy.

FaultdetectionThestoragemanagerperiodicallychecksifeachheadandstorageserversareresponsive.

RecoveryOnafailure,thestoragemanagerattemptstoreconfiguremirrorsbyallocatingnewextentsfromotherdisksautomatically.VastSkyRecoveryMechanismdataLogicalVolumeStoragePooldataStorageServer1StorageServer2StorageServer3StorageServer4crashspareCopydataAttach1stNetworkAttach2ndNetworkAttachRecoveringVastSkyScalabilityMostofclusterfile-systemsandstoragesystemswhichhaveameta-datacontrolnodehaveascalabilityproblem.VastSkydoesn'thavethisproblemsinceoncealogicalvolumeissetup,allI/OoperationswillbedoneonlythroughLinuxdriverswithoutanystoragemanagerinteractions.VastSky

LoadBalanceWithVastSky'sapproach,theloadswillbeequalizedacrossthephysicaldisks,whichleadsthatitutilizestheI/Obandwidthofthem.

D2D2D1D1D3D3D1D1D3StorageServer2D2D1D2D2D1D3D1D3StorageServer1D2D1D3D2D3D2D1D3D1D3StorageServer3D3D2D2D3D1D2D3D1D2StorageServer4StoragePoolLogicalVolumeD3D2D1D3D2D1D3D2D1D3D2D1LustreFileSystemWhatisLustre?LustreisaPOSIX-compliantglobal,distributed,parallelfilesystem.LustreislicensedunderGPL.Somefeatures:ParallelsharedPOSIXfilesystemScalableHighperformancePetabytesofstorageCoherentSinglenamespaceStrictconcurrencycontrolHeterogeneousnetworkingHighavailabilityLustreFileSystemLustrecomponents:MetadataServer(MDS)TheMDSservermakesmetadatastoredinoneormoreMDTs.MetadataTarget(MDT)TheMDTstoresmetadata(suchasfilenames,permissions)onanMDS.ObjectStorageServers(OSS)TheOSSprovidesfileI/Oservice,andnetworkrequesthandlingforoneormorelocalOSTs.ObjectStorageTarget(OST)TheOSTstoresfiledataasdataobjectsononeormoreOSSs.

Lustrenetwork:SupportsseveralnetworktypesInfiniband,TCP/IPonEthernet,Myrinet,Quadrics,…etc.Takeadvantageofremotedirectmemoryaccess(RDMA)ImprovethroughputandreduceCPUusageLustreFileSystemLustreFileSystemLustreinHPCLustreistheleadingHPCfilesystem15ofTop30DemonstratedscalabilityPerformanceSystemswithover1,000nodes190GB/secIO26,000clientsExamplesTitansupercomputeratOakRidgeNationalLaboratoryTOP500:#1,November2012SystematLawrenceLivermoreNationalLaboratory(LLNL)TexasAdvancedComputingCenter(TACC)CephOverviewCeph

isa

freesoftware

distributedfilesystem.Ceph'smaingoalsaretobe

POSIX-compatible,andcompletelydistributedwithouta

singlepointoffailure.Thedataisseamlessly

replicated,makingit

faulttolerant.ReleaseOnJuly3,2012,theCephdevelopmentteamreleasedArgonaut,thefirstreleaseofCephwithlong-termsupport.CephIntroductionCephisadistributedfilesystemthatprovidesexcellentperformance,reliabilityandscalability.Objected-basedStorage.Cephseparatesdataandmetadataoperationsbyeliminatingfileallocationtablesandreplacingthemwithgeneratingfunctions.Cephutilizesahighlyadaptivedistributedmetadatacluster,improvingscalability.Usingobject-basedstoragedevice(OSD)todirectlyaccessdata,highperformance.CephObjected-BasedStorageCephGoalScalabilityStoragecapacity,throughput,clientperformance.EmphasisonHPC.ReliabilityFailuresarethenormratherthantheexception,sothesystemmusthavefaultdetectionandrecoverymechanism.Perfor

温馨提示

  • 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
  • 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
  • 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
  • 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
  • 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
  • 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
  • 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

最新文档

评论

0/150

提交评论