22.2 FUNCTIONAL CHECKS
The functional checks are helpful when you need to determine overall operability of the OVO environment. When the functional checks do not result in the desired outcome, the next step is to plan time for the necessary troubleshooting, configuration changes, or perhaps a patch fix. In order to isolate the problem, ensure that you have collected as much system information as possible. Also, perform the initial data gathering and basic troubleshooting prior to placing a support call, if necessary. Table 22-3 provides a listing of OVO functional checks.
Check the following areas | Procedure | Notes |
---|---|---|
1. Are all of the management server processes running? | opcsv | Restart the processes. Investigate why the process was not running. |
2. What is the version of the management server software? | opcsv -version | Verify the current patch level for the management server. Upgrade if necessary. |
3. Are all of the managed node processes running? | ovc | Check the error log file for possible reasons the process is down. Restart the processes. |
4. What is the patch version of the managed node software? | opcagt -version | Try to maintain the current patch release. |
5. Check the installed version of .managed node software and the sub-agent components from the management server | opcragt agent_version <node> | This is also a quick way to check the communications between the server and the node. If this fails to verify, the node is managed by this server. It may be necessary to restart the processes on the node. |
6. Check for OVO errors on the DCE managed node. | tail /var/opt/OV/log/opcerror | If there are error messages in the file, check for resolutions via the eCare problem resolution web site. It may be necessary to log a support call via Itrc.hp.com. |
7. Check the OVO errors on the HTTPS managed node. | tail /var/opt/OV/log/System.txt | |
8. Check the remote procedure call service (RPC), verify that the DCE/(NCS) or RPC-based agents are registered. | rpccp show mappingopcmsgrdopcdistmopcdispmRPC processes on the node:rpcd (HP)opcctladced (RPC)llbserver (NCS) | RPC processes on the server: |
9. Check the status of the Oracle processes. | ps ef|grep ora | If the oracle processes are not running, the OVO processes will not start. |
10. Verify the Oracle listener. | lsnrctl status | If the listener lsnrctl start is not running, restart it. The database must be open in order for the listener to start properly. You may need the help from a DBA to restart the database. |
11. Check the OpenView platform background processes and services (NNM). | ovstatus -c | If the OpenView background processes are not running, OVO processes will not start. |
12. Verify hostname resolution from the server. | nslookup <node> | The OpenView platform utilizes nslookup <local_server_name> name resolution lookup services, this information must be correct within the DNS, NIS or local host file.Note: The command nsquery is becoming more useful for hostname or IP lookups. It provides more detailed information. |
13. Verify hostname resolution from the managed node. | nslookup <server> nslookup <local_node_name> | |
14. Check network connectivity from the server. | ping <node> | ping tests may indicate certain network delays that can be viewed with OpenView as a node down. Intermittent ping failures should be investigated and resolved. |
15. Check network connectivity from the managed node | ping <server> | If the node is unable to ping the server; messages are buffered until the server is available. |
16. Check the name resolution service configuration file. | more /etc/nsswitch.conf | Verify the search order. If using multi-homed hosts, see #17. |
17. Is the node multi-homed? | Add an entry with the second IP_ADDRESS HOSTNAME into a file that you create s on the server called /etc/opt/OV/share/conf/OpC/mgmt_sv/opc.host | The file has the same format as the /etc/hosts file. After the file is created restart the server processes. Any changes to the file require process restart. |
18. Operator and Administrator login. | ps -ef|grep uips -ef|grep opcuiwww | Each operator (including the administrator) should have a unique login account for security purposes. |
19. Login to OVO using the operator account to ensure that the environment displays the correct information based upon the operators responsibilities. | ||
20. Check the Oracle environmental variables | cat /etc/opt/OV/share/conf/ovdbconf | |
21. Check the Oracle environmental variable within the user Oracle shell environment. | su oracle | env|grep ORA |
Note: Compare this output to the listing found in Chapter 23. It may be necessary to consult with a DBA to verify the health of the OVO oracle database. The oracle user account requires the correct environment for installation and configuration of the database and during runtime operations. | ||
22. Check for all the environmental variables within the user oracle shell environment | su oracleenv > /tmp/ora_env.outmore /tmp/ora_env.out | System environment variables are also important, such as the PATH, MANPATH, LANG, and so on. |
23. Check connectivity to the Oracle database |
| Oracle 8.x and 9.x |
Note: It may be necessary to check with a DBA to gain access to the database using the privileged user accounts. | ||
24. Check remote connections to the database. |
| |
25. Check the error logs. | Refer to Section 22.1.1. | |
26. Clean up pending template distributions. | ll /var/opt/OV/share/tmp/OpC/distribrm /var/opt/OV/share/tmp/OpC/distrib/* | If the distribution failed, a message will periodically appear in the message browser; remove the files from the distribution directory. |
27. Check the integrity of the queue files | /opt/OV/contrib/OpC/opcqchk <queue_file> | If this program presents a menu and basic operations can be performed the queue file is OK. |
28. Test message data flow in general | opcmsg a=a o=o msg_t=test | If the message appears in the browser message flow is OK. |
29. Messages should appear in the browser of the assigned operator if not check the following: |
| |
30. Test action execution. | Use command broadcast and run a command (i.e. hostname) | If successful actions work in general |
31 Verify agent status | opcragt <node_name> | This displays agent status and tests basic connectivity |
32. Test basic HTTPS connectivity | bbcutil -ping <node> | Execute the command on the managed node |
33. Test HTTPS agent security certificates | ovcert -list | |
34. Test connectivity of DCE agent | rpccp show mapping | |
35. Utilize the trace facility | If necessary to further investigation configure XPL tracing | Refer to the Tracing Concepts and Users Guide for more information about XPL tracing and logging. |
22.2.1 When the Processes Will Not Start
If you perform the functional check and the processes will not start, it could be due to corrupt queue or pipe files. Corrupt queue files occur on rare occasions if the OS or physical disk buffers have not been flushed to disk successfully. These files can be removed from the server or node while the OVO processes are not running. If it becomes necessary to remove the queue files, be aware they contain data that will be lost. However, if you have tried everything else and you are still experiencing problems with agent or server processes, move the queue files. Check the integrity of the OVO processes by moving the queue files temporarily, then restart the agent. If the agent starts, return the original queue file; this will minimize data loss. If necessary remove the queue files and restart the processes. The queue files will be recreated.
22.2.1.1 Procedure to Remove the Queue Files on the Server
- Exit all OVO GUIs.
- Stop the management server processes:
ovstop ovoacomm . - Erase the OVO temporary files:
rm /var/opt/OV /share/tmp/OpC/mgmt_sv/* . - Restart the management server processes:
ovstart opc . - Restart the GUI:
opc .
22.2.1.2 Procedure to Remove the Queue Files on the Node
- Stop all the managed node processes:
opcagt kill . - Erase the OVO temporary files:
rm /var/opt/OV /tmp/OpC/* . - Verify that all of the process have stopped:
ps eaf|grep opc . - Restart the managed node processes:
opcagt -start .