2.3 …TO MANAGE EVERYTHING AT ONCE A frequent mistake is configuring the network and systems management system to manage everything at once. Trying to integrate too many products at once and manage too many different services across the enterprise from the beginning, with lack of personnel, training, or hardware is destined for failure. It is too much, too fast, and it's done repeatedly at site after site. Network and systems management operators are then hammered with events and messages that cannot be managed at the rate they enter their browser. Tens of thousands of messages sit in the message browser and little is done about them. The operators no longer believe they are factual. Follow these steps to alleviate the problem: Determine which systems, including network infrastructure, are going to be managed. Start with the NNM discovery process. Using discovery filters, the noDiscover files, or combination of both, configure NNM to discover only those systems.For the discovered systems, ensure that hostname resolution is correct within the /etc/hosts file, NIS, and DNS. Correct means that the forward lookup of the short and fully qualified domain name returns every IP interface associated with that system and that the reverse address lookup of every IP address associated with that system returns the same fully qualified domain name. If this is not correct, then there is a 99.999 percent chance of something not working properly in the future. The following is a short list of things that can happen because of incorrect configuration of host name resolution services. The network and system management product does not cause these problems.- Messages not being seen in the OVO message browser
- NNM events not acted upon
- Multiple-node objects for the same node in NNM
- Node names are different in each NMS product
- Different node names in the Alarm Browser
If this is the first network and systems management project, start small. Add nodes to be managed by OVO agents to the OVO node bank. Turn off ALL events in NNM except Node Up, Node Down, Interface Up, and Interface Down. Set all the other events to "Log Only." Do the same for the OVO Trap template (after copying the default to a custom template). These four alarms are meaningful and are used by the majority of customers. Starting with only these four events prevents users from being inundated with a barrage of useless events coming from the network into the network and systems management system and having to delete them out of the NNM Alarm Browser. The operators will have confidence that the alarms received are meaningful. If they don't need to worry about a specific event, they don't need to see it as an alarm in the Alarm Browser!After several weeks of running properly, peruse the log file for meaningful events and determine which events should be configured next. All the processes and procedures must be in place (such as a standard operating procedure, SOP for short) so that the operators will know what to do when the log file is received. If there is an automatic action that must be executed, it should be thoroughly tested. Operators will also need to know (through the instruction interface in OVO) what to do if the automatic action fails to correct the problem.Determine the event requirements and integration points that will be provided by the additional products being added to the NMS. Add the products, one at a time, into the NMS. If adequate expertise exists, more can be done in parallel. Do not overwhelm the operators with useless events. The operator's browser should only contain useful and pertinent events (messages). |