HP OpenView System Administration Handbook [Electronic resources] : Network Node Manager, Customer Views, Service Information Portal, HP OpenView Operations نسخه متنی

اینجــــا یک کتابخانه دیجیتالی است

با بیش از 100000 منبع الکترونیکی رایگان به زبان فارسی ، عربی و انگلیسی

HP OpenView System Administration Handbook [Electronic resources] : Network Node Manager, Customer Views, Service Information Portal, HP OpenView Operations - نسخه متنی

Tammy Zitello

| نمايش فراداده ، افزودن یک نقد و بررسی
افزودن به کتابخانه شخصی
ارسال به دوستان
جستجو در متن کتاب
بیشتر
تنظیمات قلم

فونت

اندازه قلم

+ - پیش فرض

حالت نمایش

روز نیمروز شب
جستجو در لغت نامه
بیشتر
لیست موضوعات
توضیحات
افزودن یادداشت جدید


3.8 HIGH AVAILABILITY AND FAULT TOLERANCE REQUIREMENTS


Mirrored disks, fail-over sites, multi-computer fail-over, aggregated network links, active-standby systems and network links, routing protocols, alternate physical volume links, disk arrays, redundant powerthese are just some of the possible requirements for high availability and fault tolerance. These requirements can be placed on the individual systems, multiple systems acting as one system, and systems acting in conjunction with one another in the overall design during an anomaly.

The amount of high availability and fault tolerance in the design is driven by the requirements, but what is actually implemented is driven by cost. The same can be said for the entire NMS plan. High availability and fault tolerance requirements will be driven by the necessity of continuity of operations. What is the importance of network and systems management to the organization during any type of outage to the NMS? How much high availability and/or fault tolerance can be purchased based on the capital spending allotted to the project?

Note

Always design a system that maintains data integrity over everything else, even over system performance. System performance should take precedence over ease of system administration.

If a disk array cannot be purchased due to budget constraints, at least purchase enough disks to mirror the data even if they are mirrored on the same bus. Replacing failed hardware components is much faster than having to rebuild the operating system or restoring data that is hours or even days old and being unable to manage the network during the process. There is also the fact that setting up software in a high availability/fault tolerance (HA/FT) environment works differently based on the product. Configuring the availability of SNMP traps sent to different NNM management stations is totally different than the configuring the availability of OVO messages sent from a managed node to different OVO management stations.

NNM can be configured in a Manager of Managers configuration to ensure continued status polling if there is an outage to a collection station. Any management station that has an enterprise license can be configured to manage and monitor another management/collection station and assume status polling for the systems for which the "managed" management/collection station is responsible. This configuration provides for the continued availability of status polling, but it is much different to provide for the continued availability of traps to a trap destination.

When a management station manages another management station to act as its collection station, it adds its name to the REMOTE_MANAGERS list on that collection station. Any trap received at the collection station that is configured to be forwarded to the REMOTE_MANAGERS list will be forwarded to each management station in the list without question. The collection station forwards only traps configured to be forwarded to either a REMOTE_MANAGERS list or to specific trap destinations. Otherwise, the traps "stop" at the collection station. Standard Operating Procedures has to be written in order to deal with this type of situation if there are operators manning each management station for incoming events.

But if an outage to a collection station occurs and a management station takes over the status polling for that collection station, it doesn't change the trap destination of all the SNMP agents within a collection stations management domain to send traps to the management station. The SNMP agents would have list the management station as a trap destination there will be no traps sent to the management station. Then all the SNMP traps are going to both the collection station and management station. If the collection station is used as a management station, such as there are operators managing and acting on the incoming events, and at the management station, there needs to be a defined Standard Operating Procedure as well. This limitation has nothing to do with OpenView NNM itself. It is the SNMP agent protocol that doesn't support the selection of trap destinations based on the availability or unavailability of the trap destination, or selection of a destination based on time of day and so on. Placing the collection station itself within a high availability solution allows for the NNM processes to be up on one of multiple systems within a cluster.

OVO can also be configured in a Manager of Managers configuration to ensure continued reception of messages from managed nodes. A managed node is given a responsible managers file (an actual file, but distributed with templates) that defines the OVO management consoles that are authorized to be a managed nodes' primary manager. The managed node will report to and take commands from only that primary OVO management station. A managed node will switch to being managed by another authorized management console when it receives the correct command from an authorized management console. The managed node sends its messages only to its current primary manager. It does not send it to all possible primary managers like the REMOTE_MANAGERS list or trap destination list configured in an SNMP agent. The appropriate number of managed node licenses is required at each OVO management console. This scenario allows for separate OVO management stations to be located in disparate geographical areas; each management station is able to assume the responsibility for the other for continuity of operations. Continuity of operations should also be part of the overall plan, but it is not discussed in this book.

An OVO managed node can also be configured, using a responsible manager's file, to send messages to different management stations at specific times of day. A managed node can also be configured to send messages of specific types to different management stations, send messages to assigned groups within the OVO console or combination of these. An OVO agent gives more flexibility than an SNMP agent does when it comes to the delivery of messages.

High availability support for a single instance of NNM or OVO is supported on HP-UX, Solaris, and Windows (OVOW) operating systems using the supported high availability product. This allows for NNM and OVO to run on a single system within the cluster and fail-over to another system within the cluster. The "fail-over" to an adoptive node can be done manually for maintenance on the primary node, or automatic due to hardware or network failure. Placing the products within a supported high availability product allows for continuous operation of the NMS.


    / 276