Monitoring and Managing PowerEdge 1655MC High Performance_第1页
Monitoring and Managing PowerEdge 1655MC High Performance_第2页
Monitoring and Managing PowerEdge 1655MC High Performance_第3页
Monitoring and Managing PowerEdge 1655MC High Performance_第4页
Monitoring and Managing PowerEdge 1655MC High Performance_第5页
已阅读5页,还剩12页未读 继续免费阅读

下载本文档

版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领

文档简介

1、This word document was downloaded from the website: HYPERLINK , please remain this link information when you reproduce , copy, or use it.word documentsMonitoring and Managing PowerEdge 1655MC High Performance Computing Clusters Dell White PaperBy Scalable Systems GroupContents TOC o 2-4 h z t Title,

2、1,Appendices,1 HYPERLINK l _Toc36868722 Introduction: Modular Computing in HPCC PAGEREF _Toc36868722 h 3 HYPERLINK l _Toc36868723 PowerEdge 1655MC Overview PAGEREF _Toc36868723 h 4 HYPERLINK l _Toc36868724 Dells Management Solution for PowerEdge 1655MC HPC Clusters PAGEREF _Toc36868724 h 5 HYPERLINK

3、 l _Toc36868725 In-band Monitoring and Management PAGEREF _Toc36868725 h 7 HYPERLINK l _Toc36868726 IT Assistant (ITA) PAGEREF _Toc36868726 h 7 HYPERLINK l _Toc36868727 Ganglia PAGEREF _Toc36868727 h 8 HYPERLINK l _Toc36868728 Out-of-Band Monitoring and Management PAGEREF _Toc36868728 h 12 HYPERLINK

4、 l _Toc36868729 ERA/MC PAGEREF _Toc36868729 h 12 HYPERLINK l _Toc36868730 Digital KVM PAGEREF _Toc36868730 h 14 HYPERLINK l _Toc36868731 Conclusions PAGEREF _Toc36868731 h 16 HYPERLINK l _Toc36868732 References PAGEREF _Toc36868732 h 16Figures TOC h z c Figure HYPERLINK l _Toc36868787 Figure 1: Powe

5、rEdge 1655MC Chassis - Front View PAGEREF _Toc36868787 h 4 HYPERLINK l _Toc36868788 Figure 2: PowerEdge 1655MC Chassis - Rear View PAGEREF _Toc36868788 h 4 HYPERLINK l _Toc36868789 Figure 3: 66-blade PowerEdge 1655MC HPC Cluster Configuration PAGEREF _Toc36868789 h 6 HYPERLINK l _Toc36868790 Figure

6、5: At-a-Glance View of Ganglia PAGEREF _Toc36868790 h 9 HYPERLINK l _Toc36868791 Figure 6: Information about One Node PAGEREF _Toc36868791 h 10 HYPERLINK l _Toc36868792 Figure 7: Web Based ERA/MC Console PAGEREF _Toc36868792 h 13 HYPERLINK l _Toc36868793 Figure 8: OSCAR screen on 2161DS PAGEREF _Toc

7、36868793 h 15 HYPERLINK l _Toc36868794 Figure 9: ERA/MC and KVM Controller Card PAGEREF _Toc36868794 h 15Section 1Introduction: Modular Computing in HPCCModular computing solutions target environments in which the servers are consolidated into one physical location, which is most commonly the case w

8、ith clusters. Some elements the power supply, the cabling, and the systems management do not need to be replicated for every server, and can be shared among the modular pieces.The Dell PowerEdge 1655MC is the first product in Dells Modular Computing or “blade server” product line. Blade server archi

9、tecture introduces several self-contained servers, known as blades, within a server chassis. Each blade has its own processor(s), memory, I/O subsystem, a set of hard drives, an operating system, and other basic components. The chassis provides redundant infrastructure components, such as power supp

10、lies, fans, and switches. The concept of modular computing has the potential to increase server density, improve manageability, lower power consumption, and enhance deployment and serviceability, all resulting in lower TCO (Total Cost of Ownership). Furthermore, the PowerEdge 1655MC modular design a

11、dds the following advantages compared to integrated servers, which make it an ideal element for constituting a high performance computing cluster:Low heat productionLow power consumptionLower space requirements (0.5U/server) Easy deployment and simplified cable managementEase of service and replacem

12、entEase of adding computing resourcesSection 2PowerEdge 1655MC OverviewThe Dell PowerEdge 1655MC features up to six server blades in one chassis in a 3U form factor. Each blade functions as an individual server utilizing its own memory, 2 CPUs and 2 internal SCSI hard drives. The chassis includes po

13、wer supplies, network module, fans, and a management module. The PowerEdge 1655MC optionally ships with a USB CDROM/Floppy drive. The chassis also contains two Gigabit Ethernet network switches, which connect internally to two network interface cards (NICs) embedded on each blade. Additionally, Dell

14、 embedded remote access (ERA) hardware and firmware are integrated in the chassis. The ERA module monitors all the shared infrastructure components of the chassis. Figure 1 and 2 show the PowerEdge 1655MC front view and back view respectively. For detailed information regarding Dell PowerEdge 1655MC

15、, refer to HYPERLINK s Figure SEQ Figure * ARABIC 1: PowerEdge 1655MC Chassis - Front ViewFigure SEQ Figure * ARABIC 2: PowerEdge 1655MC Chassis - Rear ViewSection 3Dells Management Solution for PowerEdge 1655MC HPC ClustersDells PowerEdge 1655MC HPCC solution provides four methods of managing and m

16、onitoring the cluster: Dell OpenManage IT Assistant, Ganglia, digital KVM and ERA. IT Assistant and Ganglia are the two in-band management tools that use the cluster fabric, or intra-cluster network, for monitoring and management traffics. IT Assistant is Dells server management solution that provid

17、es a centralized management console used to discover nodes on the network and examine hardware sensor data to prevent failures at the system level. Ganglia is an OS-level cluster monitor that can be used to look at resource usage, detect node failures, and troubleshoot performance problems. Both ITA

18、 and Ganglia require OS support and use the cluster fabric for communication. Figure 3 shows a sample of PowerEdge 1655MC HPC cluster configuration formed by 66 blades as the compute nodes. The Cluster Fabric in the diagram is constructed by using three Dell PowerConnect 5224 Gigabit Ethernet switch

19、es. Four Gigabit Ethernet links are used as a network trunk from each PowerEdge 1655MC chassis to one of the PowerConnect 5224 switches. A dedicated IT Assistant node a PowerEdge 1650 as the IT Assistant monitoring and management station is connected to one of the switches as well as to the ERA Fabr

20、ic. The ERA Fabric is constructed by using a PowerConnect 3024 Fast Ethernet switch. The ERA ports on PowerEdge 1655MC chassis are connected to the PowerConnect 3248 switch. The master node, a PowerEdge 1650 server is also connected to the 3024 switch, so that both the ITA node and the master node c

21、an perform out-of-band ERA monitoring and management functions.The other out-of-band fabric called KVM Fabric is going through a digital KVM switch, the Dell 2161DS Remote Console Switch. The KVM ports on the PowerEdge 1655MC chassis, the master node, and the ITA node are connected to the 2161DS swi

22、tch. The Ethernet ports on the 2161DS switch is connected to the LAN outside the cluster to form a complete out-of-band management network independent to the cluster fabric and the ERA fabric. For detail information about utilizing the 2161DS switch, refer to: HYPERLINK :/ dell /us/en/biz/topics/pow

23、er_ps3q02-avocent.htm . For information regarding PowerEdge 1655MC HPC clusters, please visit the Dell HPCC web site at: HYPERLINK :/ dell /us/en/esg/topics/products_clstr_gb1655_pedge_configs_1655_cluster_hpcc.htm Figure SEQ Figure * ARABIC 3: 66-blade PowerEdge 1655MC HPC Cluster ConfigurationSect

24、ion 4In-band Monitoring and ManagementIt is important for an HPCC system administrator to be able to monitor a cluster at the hardware level especially in a large cluster environment. Dell HPC cluster solution offers two methods of in-band management:Dell OpenManage IT Assistant (ITA), a Web-based t

25、ool for managing Dell servers, and Ganglia, an open source monitoring tool, developed at the University of California, Berkley.IT Assistant (ITA)Using the OpenManage IT Assistant, a web browser-based tool that supports all of the PowerEdge 1655MC components through the Simple Network Management Prot

26、ocol (SNMP), allows cluster administrator the ability to manage and monitor the hardware of an entire cluster, and to perform day-to-day cluster management tasks from a centralized location using a GUI. SNMP provides the communication between the management console and the nodes, with every system c

27、omponent running an SNMP agent. IT Assistant provides the following functionality: Discovery of the chassis and chassis components (see Figure 4)Support for hot swapping bladesSummary and status information for all chassis components and support for system inventory and searchLaunch of management ap

28、plications for chassis componentsManagement of events generated by chassis componentsPage/e-mail when an event occurs One-to-many centralized consoleAll of the functions mentioned above are crucial to the management of a HPC cluster. One of the most basic system administration tasks, discovery and i

29、dentification of nodes, is performed by IT Assistant, as well as discovery of chassis components the embedded Ethernet switch and the ERA module.IT Assistant allows the administrator to hot swap any blade in the chassis without interrupting the other blades, which allows maintenance to be performed

30、without shutting down entire blades in a chassis. As the cluster grows in size, the node status information becomes even more important to monitor in order to simplify administration. IT Assistant provides such information as system name, IP address, MAC address, versions of components, memory size,

31、 chassis service tag, chassis asset tag, blade slot number and blade service tag. IT Assistant provides one-to-many functionalities such as remote shutdown, flash BIOS, configuration of server alert functions as well as inventory for all components.IT Assistant includes an event management system (E

32、SM) for capturing any event that is generated by the modules through SNMP traps. Administrators can associate actions with specific events, including email, paging or application launching. Figure SEQ Figure * ARABIC 4: Summary of PowerEdge 1655MC Chassis InformationGangliaAnother in-band management

33、 tool available in a Dell PowerEdge 1655MC cluster offering is Ganglia, an open source OS-level cluster monitor. Out of the box, Ganglia monitors and automatically graphs over 20 metrics such as the nodes load average, number of running processes, number of incoming and outgoing network packets, tot

34、al and free memory on every node of the cluster, etc.Ganglia provides several levels of cluster information. At-a-glance view (Figure 5) shows the overall status of the cluster and summarizes total node count, number of nodes that are up, overall load average, and CPU and memory utilization for the

35、cluster. Color-coding is used to represent CPU utilization to enable quick identification of overloaded systems. A crossbones icon indicates a node is down. Selecting a different metric in this view redisplays the screen with the value of this metric for each node, and uses the metric as a sort inde

36、x when displaying the nodes.Figure SEQ Figure * ARABIC 5: At-a-Glance View of GangliaClicking on an individual node icon displays all available information for this node (Figure 6). This view summarizes the static information such as the version of the OS, system usage, IP address, machine type, and

37、 graphs those metrics that change over time, such as memory and CPU usage, network traffic stats, number of running process, disk usage, etc.Figure SEQ Figure * ARABIC 6: Information about One NodeUsing Ganglia allows administrators define and add other parameters in the cluster that they want to mo

38、nitor. Ganglias GUI will automatically graph those values in addition to the pre-set metrics for every node. Ganglia also simplifies cluster management by providing a remote execution environment. This feature is used for remote management, and to execute commands in parallel on multiple nodes.Addit

39、ionally, Ganglia provides the ability to monitor multiple clusters. This is especially useful in large compute centers where computational resources are grouped in smaller clusters for specialized use. The centralized console enables an administrator to monitor multiple clusters at once, while maint

40、aining high level of security by defining trust relationships.Section 5Out-of-Band Monitoring and ManagementDuring heavy communication between application components or blade-server nodes, in-band management and monitoring can inaccurately report network or server problems, since they share the fabr

41、ic with the applications. In addition, any monitoring/management traffics will consume resources that are used by parallel applications. Finally, if a machines operating system (OS) is not responding, neither method guarantees access to the node and ability to fix the problem since both methods rely

42、 on the OS support.In these situations, system administrators can use the out-of-band network management methods to communicate with the cluster hardware, and diagnose or fix problems. Dells HPCC solution provides two out-of-band management routes: digital KVM and ERA.ERA/MCThe Dell Embedded Remote

43、Access/MC Controller for Dell PowerEdge 1655MC provides remote systems management for the modular computing blades. ERA/MC provides an out of band management route by utilizing its own dedicated processor, memory, bus and network port, without consuming the cluster computing or network resources. If

44、 the cluster blades become unresponsive, ERA/MC allows the administrator to view and access the nodes remotely to troubleshoot the system. ERA/MC provides the following functionality to the PowerEdge 1655MC system:Initial configuration of chassis and bladesScripting for automationLocal and remote ma

45、nagement of chassis and blades Configuration of blades, network switches and the digital KVM through console redirectionRemote firmware updates Remote monitoring of fans and sensorsRemote power cycle, power down and power upThe use of ERA/MC within an HPC cluster simplifies cluster management and al

46、lows a system administrator to monitor the hardware components remotely either through a CLI (through the serial port) or a web-based GUI console. The main utility used in ERA is racadm (remote access control and administrator), which provides the interface for monitoring and configuring the system.

47、 The racadm utility can be used through a serial port using communications program such as minicom or HyperTerminal, through a remote interface or through a web-based console across the network. Through the serial interface, the administrator can view or modify the configuration settings on the chas

48、sis or the blades. For instance, the administrator can change the IP configuration of the ERA/MC port to be able to access the GUI available on the web console (Figure 7). Also through the serial interface, console redirection can be used for configuring the blades, switches and KVM. An administrato

49、r can use the automated scripting feature to run configuration commands on multiple nodes. This proves to be a useful tool for making identical changes within large cluster configurations. The remote interface is currently only supported through the MS-DOS environment using Windows, allowing the use

50、 of the racadm command for managing the nodes. This CLI here provides the only means in modifying the properties of the ERA/MC on the PowerEdge1655MC and the automated scripting can be used here as well. The web interface can be accessed through any supported web browser using the ERA/MC IP address

51、or through IT Assistant. It allows the user to utilize the features of the ERA/MC in a graphical interface. Figure SEQ Figure * ARABIC 7: Web Based ERA/MC Console One of the main features of out-of-band management is the ability to control and monitor the hardware from a remote location. The racadm

52、commands on the PowerEdge 1655MC allow the administrator to view the health status of the chassis and blades within the cluster. By allocating appropriate IP addresses to the ERA/MC ports of all the chassis within a cluster, the administrator can assign names to each system and to each blade, allowi

53、ng access to individual blades in order to utilize specific resources. Using racadm, there are multiple commands to use to troubleshoot a cause of a failure. For example, the administrator can view information on the modules within the chassis: the blades, the network switches, the fans; the sensor

54、information about rpms of the fans, the status of the power supplies and much more. For the blades, administrators can power-cycle the nodes individually, reset configurations and cause LEDS to blink or glow to easily identify systems within a cluster. Digital KVMThe PowerEdge1655MC contains an embe

55、dded digital KVM switch, which allows video and keyboard and mouse access to each blade. All access to the blades is from the management card on the chassis, which can either be through the standard analog PS2 keyboard, mouse and video ports or through the Analog Rack Interface port with a CAT5 cabl

56、e. The Analog Rack Interface port can be connected directly to a port on the Dell 2161DS Remote Console Switch with a CAT5 cable, which cascades the switches and allows them to be accessed from one central place. In large cluster configurations with several PowerEdge 1655MC chassis, this can greatly

57、 minimize the cable organization and management.The 2161DS Remote Console Switch pulls together both analog and digital technologies to provide a central point of access to an entire cluster. Each KVM switch has 16 ports to attach machines or other switches and can be networked over a LAN connection

58、 to provide remote access to these machines. Each machine must use a System Interface POD (SIP) for converting the keyboard, video and mouse signals to Ethernet. This considerably reduces the groups of KVM cables that are usually associated with HPC clusters. The switch comes with a cross-platform s

59、oftware that allows the administrator to manage the switch, install a new 2161DS switch or launch video sessions to a system server. The administrator can view multiple machines from this access point and use the keyboard and mouse on the individual machines. A 2161DS KVM switch with one or more cha

60、ssis attached allows the view of all the blades from a centralized location. The KVM switches use OSCAR (On-Screen Configuration and Activity Reporting interface) to select the nodes; with multiple chassis are cascaded, the user is able to see all nodes on one interface.Figure SEQ Figure * ARABIC 8:

温馨提示

  • 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
  • 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
  • 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
  • 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
  • 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
  • 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
  • 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

评论

0/150

提交评论