双活数据中心与灾备解决方案技术部分_第1页
双活数据中心与灾备解决方案技术部分_第2页
双活数据中心与灾备解决方案技术部分_第3页
双活数据中心与灾备解决方案技术部分_第4页
双活数据中心与灾备解决方案技术部分_第5页
已阅读5页,还剩95页未读 继续免费阅读

下载本文档

版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领

文档简介

1、议程议程 1同城双活技术方案 2异地灾备技术方案 1 基于虚拟化技术的业务连续性解决方案概览基于虚拟化技术的业务连续性解决方案概览 资源池资源池 vSpherevSpherevSphere 本地站点灾备站点 基于虚拟化层的异步复制基于虚拟化层的异步复制 基于硬件设备的同异步复制基于硬件设备的同异步复制 自动化应用切换管理自动化应用切换管理 城域集群城域集群 应用感知的高可用性应用感知的高可用性 关键应用零停机保护关键应用零停机保护 在线迁移虚拟机,动态调配计算与存储资源在线迁移虚拟机,动态调配计算与存储资源 VMotion and Storage VMotion 高效的数据备份与恢复高效的数据

2、备份与恢复 可通过运行计划与脚本实现自动化操作可通过运行计划与脚本实现自动化操作 资源池资源池 vSpherevSphere Dev / Test Dev / Test Dev / Test 灾难恢复灾难恢复 本地高可用本地高可用 数据保护数据保护 方案特点 与应用程序和操 作系统无关 与硬件设备无关 完善的保护 简单,经济 2 议程议程 1同城双活技术方案 2异地灾备技术方案 3 双活数据中心在各个级别上全面保障可用性双活数据中心在各个级别上全面保障可用性 vMotion&DRS HA & FT 服务器 Storage vMotion, Storage DRS 存储 VMFS VMFS 硬件

3、热添加 多网卡绑定 存储多路径 组件数据 Data Replication Metro Cluster 站点 4 双活数据中心总体架构双活数据中心总体架构 双活存储集群 站点A 站点B 延伸的vSphere集群 200 km 行为与单个vSphere相同 延伸距离最大200KM,通常小于50KM 通过VMware HA与vMotion实现自动的DR保护 需要双活存储集群,如EMC的vPlex,NetApp的MetroCluster等 5 计算资源设计计算资源设计 Making an Application Service Highly Available vSphere HA vSphere

4、App HA 7 VMware vFabric tc Server vSphere App HA Policy-based Protect off-the-shelf apps 8 Fault Tolerance vs. High Availability Fault tolerance Ability to recover from component loss Example: Hard drive failure High availability Uptime percentage in one yearDowntime in one year 993.65 days 99.98.76

5、 hours 99.9952 minutes 99.999 “five nines”5 minutes X 9 支持多支持多vCPU的容错技术的容错技术 Instantaneous Failover 4 vCPU4 vCPU vSphere PrimarySecondary Fast Checkpointing 10 长距离长距离vMotion vSphere 6.0支持跨三层网络和跨vCenter Server的vMotions 11 vCenter Availability Run vCenter Server application in a VM Run vCenter Server

6、database in a VM Run both in same VM? Protect with vSphere HA vCenter and DB VM restart priority set to High Enable guest OS and App monitoring App HA can protect SQL Server database Back up vCenter Server VM and database Image-level backup for vCenter Server VM App-level backup using agent for data

7、base backup 12 网络资源设计网络资源设计 双活数据中心网络架构双活数据中心网络架构 物理二层物理二层 (裸光纤裸光纤) 逻辑二层逻辑二层 层叠网络层叠网络 / VPN 二层分段 VM VM VM 二层分段 VM VM VM 二层网络 扩展的二层网络 (二层内容在 数据中心互连 链路上传递) 站点A站点B 二层网络 14 NSX vSphere Multi-Site Use Cases NSX for vSphere supports 3 different Multi-Site Deployment Models 1.VXLAN with Stretched Clusters (

8、vSphere Metro Storage Cluster) 2.VXLAN with Separate Clusters 3.L2 VPN All solutions provide L2 extension over an L3 network, enabling workload & IP mobility without the need to stretch VLANs Local egress is supported, however it does add complexity The appropriate deployment model will depend on cu

9、stomer requirements and their environment NSX利用层叠网络实现双活数据中心利用层叠网络实现双活数据中心 双活存储双活存储 vSphere城域存储集群城域存储集群 数据存数据存储储1 数据存数据存储储2 vCenter Server 三层 网络 站点站点A站点站点B VM1 VM 2 VM3 逻辑逻辑交交换换机机A /24 VM 4 VM5 逻辑逻辑交交换换机机B /24 分布式逻辑路由器分布式逻辑路由器 站点站点A A 边界网关边界网关 上联网络上联网络A 站点站点B B 边界网关边界网关 上联网络上联网络

10、B 16 VMware NSX Multi-Site Single VC, Stretched Cluster Solution Detail Requires a supported vSphere Metro Storage Cluster configuration In a vMSC deployment, storage is Active/Active and spans both sites. Examples of Active/Active storage are: EMC VPLEX, NetApp Metro Cluster (see VMware HCL for mor

11、e information) Stretched clusters support Live vMotion of workloads Use L3 for all VMkernel networks: Management, vMotion, IP Storage All management components such as vCenter Server, NSX Manager and Controllers are located in Site A Latency and bandwidth requirements are dictated by vMSC storage ve

12、ndor, eg 10ms RTT for VPLEX which also aligns with vMotion using Enterprise Plus vMSC enables disaster avoidance and basic Disaster Recovery (without the orchestration or testing capabilities of SRM) Loss of either NSX Components or the Datacenter Interconnect will results in a fallback to data plan

13、e based learning using existing network state. Therefore there is no outage to data forwarding and without vCenter Server, there are no VM provisioning or migration operations NSX and vMSC are complimentary technologies that fit a sweet spot for NSX (Single vCenter Server) VMware NSX Multi-Site Sing

14、le VC, Stretched Cluster VMware ESXiVMware ESXiVMware ESXiVMware ESXi Site ASite B Stretched Workload Cluster VMware ESXiVMware ESXiVMware ESXiVMware ESXi Site ASite B Stretched Edge Cluster Cluster Configuration vMSC enables stretched clusters across two physical sites In an NSX deployment Manageme

15、nt, Edge and Workload clusters are all stretched Under normal conditions all Management Components run in a Site A and are protected by vSphere HA They are automatically restarted at Site B in the event of a site outage. The management network is not stretched and must be enabled on Site B as part o

16、f the recovery run book Dependent on design, NSX Edge Services Gateways are either active in both sites or a single site and can also leverage HA VMs in the Workload Clusters are automatically recovered VMware ESXiVMware ESXiVMware ESXiVMware ESXi Site ASite B vCenter Server Stretched Management Clu

17、ster 18 VMware NSX Multi-Site Single VC, Stretched Cluster In a vMSC environment, DRS is used to balance resource utilization, provide site affinity, improved availability and ensure optimal traffic flow Use Should rules, rather than Must as this allows vSphere HA to take precedence Example DRS Grou

18、ps, Rules and Settings for NSX Edges: VMware NSX Multi-Site Single VC, Stretched Cluster NSX Configuration (Option 1 - Preferred) Transport Zone spans both Sites and VXLAN Logical Switches provide L2 connectivity to VMs Distributed Logical Routing is used for all VMs to provide consistent default ga

19、teway vMAC Local Egress is provided by using separate Uplink LIFs and Edge GWs per site. Hosts on Site A have DLR default gateway configured via Site A Edge GW using net-vdr CLI. While Site B DLR default gateway is via Site B Edge GW Caveat: Dynamic Routing cannot be enabled on DLR, or a static rout

20、e set via NSX Manager NSX Edge Gateways will have a static route for any networks directly connected to DLR. Consistent IP addressing will simplify routing by allowing a supernet to be used DFW provides vNIC policy enforcement independent of the VMs location VM1 VM2VM3 Web Logical Switch

21、/24 Site ASite B Distributed Logical Router VM4VM5 App Logical Switch /24 Site A NSX Edge GW Site B NSX Edge GW Uplink Net A /29 Uplink A LIF Uplink Net B /29 Uplink B LIF VM6VM7 DB Logical Switch /24 In

22、ternal LIFs .1 VMware NSX Multi-Site Single VC, Stretched Cluster NSX Configuration (Option 2) As per Option 1 Transport Zone spans both Sites and VXLAN Logical Switches provide L2 connectivity for VMs NSX Edge Gateways are deployed per site with the same internal IP address NSX DFW L2 Ethernet Rule

23、s are defined to block ARP to the remote GW using MAC Sets, which provides Local Egress as only the site local Edge GW is learnt. Future enhancement planned to enable ESXi host object for DFW* Caveats: Traffic flow between application tiers may be asymmetric if they are split across sites and DRS ru

24、les arent used Does not leverage Distributed Logical Routing and is limited to 10 vNICs per Edge vMotion will result in network interruption as VM ARP cache entry for site specific GW needs to time out Can be used if Option 1 isnt a fit (eg, require Dynamic Routing or vSphere 5.1 support) Site ASite

25、 B VM1VM2 VM3VM3 Site A NSX Edge GW Site B NSX Edge GW Logical Switch /24 VMware NSX Multi-Site Single VC, Separate Clusters (2) Datastore 1 Datastore 2 vCenter Server L3 Network Site ASite B VM1 VM 2 VM3 Logical Switch A /24 VM 4 VM5 Logical Switch B

26、 /24 Distributed Logical Router Site A NSX Edge GW Uplink Net A Site B NSX Edge GW Uplink Net B Storage vMotion Required for VM Mobility 22 VMware NSX Multi-Site Single VC, Separate Clusters Solution Detail Separate vSphere Clusters are used at each site, therefore DRS rules & groups are

27、not required Storage is local to a site Enhanced vMotion (simultaneous vMotion and svMotion) can provide live vMotion without shared storage Use L3 for all VMkernel networks: Management, vMotion, IP Storage All management components such as vCenter Server, NSX Manager and Controllers are located in

28、Site A Supported latency requirement for Enhanced vMotion is 100ms RTT(vSphere 6). vMotion requires 250 Mbps of bandwidth per concurrent vMotion This solution provides Disaster Avoidance where live vMotion is supported, by enabling workloads to be moved proactively between sites Does not provide aut

29、omated Disaster Recovery VMware NSX Multi-Site Single VC, Separate Clusters VMware ESXi Management Cluster VMware ESXi Site A vCenter ServerCluster Configuration Clusters do not span beyond a physical site All Management Components run in Site A, and will not be automatically recovered in the event

30、of a site outage. Storage replication to a standby Cluster in Site B and a manual recovery process could be implemented Separate Edge and Workloads Clusters are used per site NSX Edge Services Gateways are active in a single site, with HA is local to the site Workloads are active across both sites a

31、nd can optionally support live vMotion DRS affinity rules for workloads are not required VMware ESXi Edge Cluster A VMware ESXi Site A VMware ESXiVMware ESXi Site B Edge Cluster B VMware ESXi Workload Cluster A VMware ESXi Site A VMware ESXiVMware ESXi Site B Workload Cluster B 24 VMware NSX Multi-S

32、ite Single VC, Separate Clusters NSX Configuration Option 1 with Distributed Logical Routing is unchanged from Stretched Cluster configuration and is still recommended For option 2, as vCenter objects are not shared we can leverage NSX DFW L2 Ethernet Rules with a scope of the Datacenter to provide

33、Local Egress. as only the site local Edge GW is learnt. No enhancements required Same caveats with Option 2 for Stretched Clusters also apply Site ASite B VM1VM2 VM3VM3 Site A NSX Edge GW Site B NSX Edge GW Logical Switch /24 To Local Egress/Ingress or not to. A

34、s a first step, ask the customer if they have stateful services for traffic entering and exiting the Datacenter ? This is generally the case and if so they will require a solution to provide Local Ingress for their applications. Eg, NAT GSLB Anycast LISP, RHI etc If they can address this, then a Mul

35、ti-Site NSX solution providing Local Egress is a good fit If they do not, other questions to ask are: Do they have high bandwidth between sites ? and is reducing operational complexity a goal ? An active NSX Edge Gateway at one site, with failover to the secondary site may meet the customers require

36、ments and is much simpler than providing Local Egress & Ingress VMware NSX Multi-Site L2 VPN (3) Datastore 1 Datastore 2 vCenter Server Site A or On Prem Site B or Off Prem VM1 VM 2 Network A /24 Site A NSX Edge GW Site A Uplink Network vCenter Server VM3 VM4 Network B /24 Site

37、 B NSX Edge GW Site B Uplink Network SSL SSL L3 Network 27 存储资源设计存储资源设计 存储需求存储需求 Site ASite B Dark Fiber =200 km Metro Cluster Aggr X Plex0 Aggr Y Plex1Aggr Y Plex0 DWDMDWDM Aggr X Plex1 时延要求: vSphere要求RTT100ms 存储同步复制要求RTT5ms 29 Metro Storage的两种实现方式:的两种实现方式:Uniform与与Non-Uniform 30 vSphere Metro Stor

38、age Cluster工作原理工作原理 vSphere HA Cluster Stretched across campus or metro area vMSC Certified Storage Metro Cluster Plex1 APP OS APP OS APP OS APP OS APP OS APP OS Plex1 APP OS APP OS APP OS APP OS APP OS APP OS Array based synchronous replication Plex0 Plex0 31 vSphere Metro Storage Cluster工作原理工作原理 S

39、tandard vMotion of Virtual Machines vMotion vMSC Certified Storage Metro Cluster Plex1 APP OS APP OS APP OS APP OS APP OS APP OS Plex1 APP OS APP OS APP OS APP OS APP OS APP OS Array based synchronous replication Plex0 vSphere HA Cluster Plex0 32 vSphere Metro Storage Cluster工作原理工作原理 vSphere HA Clus

40、ter Storage takeover vMSC Certified Storage Metro Cluster Plex1 APP OS APP OS APP OS APP OS APP OS APP OS Plex1 APP OS APP OS APP OS APP OS APP OS APP OS Plex1 Plex0 Plex0Plex0 APP OS APP OS APP OS APP OS APP OS APP OS Site shutdown for maintenance 33 vSphere Metro Storage Cluster工作原理工作原理 vSphere HA

41、 Cluster vMSC Certified Storage Metro Cluster Plex1 APP OS APP OS APP OS APP OS APP OS APP OS Plex1 APP OS APP OS APP OS APP OS APP OS APP OS Plex0Plex0 APP OS APP OS APP OS APP OS APP OS APP OS Plex1 Plex0 Automatic resync Maintenance performed, site restored 34 vSphere Metro Storage Cluster工作原理工作原

42、理 vSphere HA Cluster vMSC Certified Storage NetApp MetroCluster Plex1 APP OS APP OS APP OS APP OS APP OS APP OS Plex1 APP OS APP OS APP OS APP OS APP OS APP OS Plex0Plex0 APP OS APP OS APP OS APP OS APP OS APP OS Plex1 Plex0 Standard vMotion to return VMs Access returned 35 存储设备选型存储设备选型 兼容性网站:http:/

43、 六类Metro Cluster Storage 1, iSCSI 2, FC 3, NFS 4, iSCSI-SVD 5, FC-SVD 6, NFS-SVD 36 EMC VPLEX for Stretched Metro Clusters Roadmap Stretched vSphere Cluster Site A (Active)Site B (Active) 10ms, IP or FC vCenter Established VPLEX Active-Active Solution Instant vMotion across distance VMware HA automa

44、tically restarts VMs at either site for system or site failure Balance workloads across both sites with VMware DRS Supports VMware FT out of the box Additional flexibility of VPLEX Metro Doesnt Require FC Cross-Connect Choose IP or FC Connectivity between sites Third Site IP connectivity to Witness

45、VM No SPOF If you lose a Director, no loss of access at any site VPLEXVPLEX Dual Site DRS Dual Site HA Instant vMotion Site C (Optional Witness) VPLEX Distributed Virtual Volumes 37 Stretched Storage with IBM SAN Volume Controller Single system image across two sites provides single pane of glass ma

46、nagement for day-to-day storage management activity Simplify management of your environment at same time as deploying active-active storage Based upon a rich and mature platform Provide Real-time Compression, Easy Tier, Non-disruptive migrations, Long distance replication 40,000 engines installed wo

47、rldwide, 11 years field experience 250+ storage devices supported to provide back-end capacity Retain your existing investment in storage devices Keep flexibility for the future Active quorum device enables automatic failover No external management software Prevents split-brain Supports recovery in

48、case of full unplanned site failure scenarios Quorum Storage Pool 1 Storage Pool 2 Site 1 Site 1 Site 2 Site 2 Site 3 SVC Stretched Cluster 38 来自存储厂商的参考指南来自存储厂商的参考指南 Implementing VMware vSphere Metro Storage Cluster with HP LeftHand Multi-Site storage http:/ Implementing vSphere Metro Storage Cluste

49、r using HP 3PAR Peer Persistence http:/ Deploy VMware vSphere Metro Storage Cluster on Hitachi Virtual Storage Platform http:/ IBM SAN and SVC Stretched Cluster and VMware Solution Implementation http:/ VMware vSphere 5.5 vMotion on EMC VPLEX Metro http:/ 39 VSAN for Metro Cluster 2015Q3 (计划计划) Site

50、 A Fault Domain AFault Domain BFault Domain C Virtual SAN Cluster Site CSIte B vmdk witness vmdk vmdk witness vmdk 从机架感知升级到站点感知: 1,迷你容错站点专用于witness 2,优先从本地站点读取数据以提升性能 40 议程议程 1同城双活技术方案 2异地灾备技术方案 41 RTO, RPO, and MTD Recovery Time Objective (RTO) How long it should take to recover Recovery Point Obje

51、ctive (RPO) Amount of data loss that can be incurred Maximum Tolerable Downtime (MTD) Downtime that can occur before significant loss is incurred Examples: Financial, reputation 42 The Three Building Blocks For Disaster Recovery vSphere Virtual SAN Ecosystem VDP Advanced vSphere Replication Site Rec

52、overy Manager VMware Array-based Backup copies External Storage Storage Compute Backup and Recovery Replication DR Orchestration 43 异地异地(同城同城)灾备解决方案总体架构灾备解决方案总体架构 44 异地异地(同城同城)灾备解决方灾备解决方案案多种映射关系多种映射关系 主备式切换双活切换双向切换 双活数据中心 Production Recovery Production Recovery Production Production 最常见的场景 花销较大 灾备架构

53、主要用于测试,开 发和培训等非生产应用 有效降低开销 两个站点均有生产应用 每个站点为对方提供容灾 支持 两个站点的应用可以跨 站点自由移动 计划内事件零停机 限制在城域范围内 Site 1Site 2 Production 45 网络资源设计网络资源设计 “Protected” Site“Recovery” Site StorageStorage VMFS/NFSVMFS/NFS Storage VMFS/NFSVMFS/NFS Replication SRM with NSX for vSphere NSX Manager NSX Controller Cluster vCenter +

54、SRMvCAC NSX Manager NSX Controller Cluster vCenter + SRMvCAC Firewall Rules & Security Groups 47 SRM with NSX for vSphere What has been validated SRM can map VMs from one VXLAN Logical Switch on the Primary Site to a different Logical Switch on the Recovery Site These Logical Switches can be connect

55、ed to pre-created NSX Distributed Logical Routers or NSX Edge Services GWs Placeholder VMs can be added to Security Groups and in a DR event, when these VMs become active they are protected by DFW Dynamic Routing can be used to advertise networks on the primary site. Using metric/weight these networ

56、ks can be re-advertised on the recovery site if there is a site failover This maps very closely to the vCAC deployment model for pre-created networks which is used for production workloads. Test/Dev workloads using on-demand networking do not typically require DR Currently being tested Automate sync

57、hronization of NSX Distributed Firewall Ruleset and Security Groups between two NSX Managers Tie into SRM, so at the time VMs are added to a Protection Group the placeholder VMs are automatically added to the appropriate Security Groups Working closely with EMC as part of their Enterprise Private Cl

58、oud Reference Architecture project to turn this into a productized solution including vCAC Logical Architecture View /24 /28 /24 /28 No Network Readdressing (Dynamic Routing) VXLAN VXLAN VLAN VLAN vCenter + SRMvCenter + SRM Distributed Logical

59、Router Dynamic Routing (OSPF, BGP) Primary VMs Placeholder VMs Distributed Logical Router Dynamic Routing (OSPF, BGP) Pre-created Logical Switches and Edges Storage Replication VMFSVMFS “Protected” Site“Recovery” Site 49 Primary

60、 VMs SRM with NSX for vSphere /24 /28 /24 /28 No Network Readdressing (Dynamic Routing) VXLAN VXLAN VLAN VLAN vCenter + SRMvCenter + SRM Dynamic Routing (OSPF, BGP) Primary VMs Placeholder VMs Distributed Logical Router

温馨提示

  • 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
  • 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
  • 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
  • 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
  • 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
  • 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
  • 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

评论

0/150

提交评论