Questions you might ask to begin the troubleshooting process include:
Are there planned maintenance activities happening at this time?
What is the complete error message?
What versions of the OpenView software products are in use?
What versions of the operating system are running on the server and agent?
What server or agent hardware platform is involved?
Where did the error occur? On the server or the node?
When did the error initially occur?
Can the problem be repeated?
Have there been any recent changes to the system (such as new software)?
What is the status of the OpenView processes?
Did any error messages appear in the log files?
Are there any errors in the itochecker report?
Do the processes start and stop properly?
Do you have the current patches installed for the agent, server, and operating system?
Do you have a current system backup?
Did the failed process produce a core file?
The troubleshooting recommendations and resource information presented in this chapter are adopted from known best practices. Due to the dynamic nature of the environment, it is important to check for the most current OpenView problem resolution resources available online at http://support.openview.hp.com. Determine whether the issue you face today may have already been resolved with a patch or documented resolution process. This chapter presents general guidelines to isolate the issues into the correct categories and collect the necessary information to begin the troubleshooting process.
Error messages from OpenView are reported to the user via a variety of sources. The error message sources are the log files, the graphical user interface, and the shell. In the graphical user interface, the error messages may appear in a pop-up window as the result of an illegal operation or in the message browser within the message text. There are a few log files that contain important information about the normal operation of the system and when necessary error messages when an operation within the OV environment did not complete successfully. For example, after installing an operating system or OV patch you should check the installation log files for any errors.
The log files that may contain important system operation and error messages are described here for reference purposes. Refer to the Administrator Guide for the platform-specific location of the log files:
opcerrror Server and agent run time error log (on DCE agent)
system.txt Server and agent run time error log file (on HTTPS-based agent)
operating system and subsystem error log files (check the files that are appropriate in your operating environment)
If an OVO error message has been produced, check the meaning and possible resolutions using the
opcerr command. The message will start with the string OpC" and contain a body and tail as shown in the following example.
# tail /var/opt/OV/log/OpC/opcerror|grep ERROR 07/16/04 10:31:36 ERROR opctrapi (Trap Interceptor)(1907) [opcevti.cpp:1460]: Receiving SNMP PDU failed: Lost connection with pmd/ovEvent process (application disconnected). Trying to reinitialize. (OpC30-204) # /opt/OV/bin/OpC/utils/opcerr 30 204 MESSAGE OpC30-204: Receiving SNMP PDU failed: .... Trying to reinitialize. INSTRUCTION: The VPO event interceptor could not get a SNMP PDU although it was informed that there is one available. The SNMP API message <snmp-msg> gives more information. The event interceptor tries to reconnect to pmd.
The OVO error messages are organized into categories based upon the number in the body (such as OpC20-xxxx). The OVO error categories with examples are shown in Table 22-1.
Error Category | Error Number | Sample Description |
---|---|---|
Internal Messages | OpC10-0001 | Insufficient memory |
Public Routines | OpC20-0001 | Invalid queue descriptor |
Agent Processes | OpC30-0001 | Invalid request to assemble |
Manager Processes | OpC40-0001 | Can't open pipe [x1] |
Database Access | OpC50-0001 | Database inconsistency detected |
Internal Database Messages | OpC51-0022 | Retry |
Messages used by the commands (API) | OpC53-0150 | Usage: opchistupl <file> |
Configuration upload/download | OpC54-0002 | Unknown option |
Database Install/upgrade | OpC55-0016 | Already exists |
User Interface | OpC60-0005 | User name must be entered |
NT Installation | OpC130-0010 | Setup program started (preinit) |
Security | OpC140-0116 | Secret key for <x1> not found |
Refer to the online Help for the complete Error Messages Reference Guide.
Some Oracle database error messages have two parts ORA-xxxx. When you need to gather more information about the error, use the utility program $ORACLE_HOME/bin/oerr. This program will produce useful information about the error and troubleshooting tips. The error message categories are shown in Table 22-2.
Message Numbers | Categories |
---|---|
00000-00099 | Oracle Server |
00200-00249 | Control files |
00250-00299 | Archiving and recovery |
00300-00379 | Redo log files |
00440-00485 | Background processes |
00700-00709 | Dictionary cache |
00900-00999 | Parsing of SQL statements |
01100-01250 | The database and its support files |
01400-01489 | SQL execution errors |
01500-01699 | DBA set of SQL commands |
02376-02399 | Resources |
04030-04039 | Memory and the shared pool |
04040-04069 | Stored procedures |
12100-12299 | SQL*Net |
12500-12699 | SQL*Net |
12700-12799 | Use of the multilingual options |
If there is a message in the message browser from the Oracle database, the message text will include the error message number. With this information, you can check the message with the
oerr command as shown in the next example. Sometimes the error messages are very complex and could signal major trouble. If you are not sure what corrective action is required, report the error to your Database Administrator or support vendor.
# $ORACLE_HOME/bin/oerr ORA 01547 01547, 00000, "warning: RECOVER succeeded but OPEN RESETLOGS would get error below" // *Cause: Media recovery with one of the incomplete re//covery options ended without
error. However, if the //ALTER DATABASE OPEN RESETLOGS command were attempted now, //it would fail with the specified error. The most likely //cause of this error is forgetting to restore one or more //datafiles from a sufficiently old backup before executing //the incomplete recovery. // *Action: Rerun the incomplete media recovery using dif// ferent datafile backups, a different control file, or different stop criteria.
Some Oracle error messages are very generic, not fatal and provide codes that can only be interpreted by contacting a DBA.
The process check is one of the best places to start checking the run-time environment. Ensure that the correct processes are running, determine why they are not, or restart the processes. If the OVO processes will not stop, check the process table with the
ps command. If necessary remove them with the
The processes running during normal operations of the server are as follows:
From the command line of the management server, use the following commands to verify that the correct processes are running:
opcsv
status: check the management server processes
The results of the opcsv command are shown here:
#opcsv OVO Management Server status: ----------------------------- Control Manager opcctlm (3847) is running Action Manager opcactm (3856) is running Message Manager opcmsgm (3857) is running TT & Notify Mgr opcttnsm (3858) is running Forward Manager opcforwm (3859) is running Service Engine opcsvcm (3864) is running Cert. Srv Adapter opccsad (3862) is running BBC config adapter opcbbcdist (3863) is running Display Manager opcdispm (3860) is running Distrib. Manager opcdistm (3861) is running Open Agent Management status: ----------------------------- Request Sender ovoareqsdr (3843) is running Request Handler ovoareqhdlr (3846) is running Message Receiver (BBC) opcmsgrb (3848) is running Message Receiver opcmsgrd (3849) is running Ctrl-Core and Server Extensions status: --------------------------------------- Control Daemon ovcd (1460) is running BBC Communications Broker ovbbccb (1467) is running Config and Deploy ovconfd (1457) is running Certificate Server ovcs (1469) is running
From the command line of the server, verify that the correct processes are running on the managed node; if necessary, restart the processes:
opcragt-status-all: Check the status of all configured agents
opcragt-status <node_name>: Check the status of a specific node
Many of the processes running during normal operation of the agent (depending on the deployed policy) are as follows:
Embedded Performance Component (coda)
The output from the
ovc command is shown here:
# ovc ovcd Control Daemon CORE (4314) Running ovbbccb HP OpenView BBC Communications Broker CORE (1467) Running ovconfd HP OpenView Config and Deploy CORE (4315) Running ovcs HP OpenView Certificate Server SERVER (4320) Running opcmsga OVO Message Agent AGENT,EA (4321) Running opcacta OVO Action Agent AGENT,EA (4322) Running opcmsgi OVO Message Interceptor AGENT,EA (4323) Running opcle OVO Logfile Encapsulator AGENT,EA (4325) Running opcmona OVO Monitor Agent AGENT,EA (4327) Running opctrapi OVO SNMP Trap Interceptor AGENT,EA (4329) Running opceca OVO Event Correlation AGENT,EA Stopped opcecaas ECS Annotate Server AGENT,EA Stopped coda HP OpenView Performance Core (4331) Running
From the command line of the HTTPS-based managed node, verify that the correct processes are running; if necessary, restart the processes.
Note
the ovc command is available on HTTPs-based nodes only. Check the agent status of DCE-based nodes with the opcagt command.
During troubleshooting, it is helpful to have all the necessary information and resources at your fingertips. The online resources provided within the OpenView platform make access to important information easy. Inside each graphical window there is a HELP button on the menu. As shown in Figure 22-1, you can obtain help about the typical administrator tasks, icons, and errors. There is also a search engine, a glossary of terms, and instructions on how to use the built-in help.
# tail /var/opt/OV/log/OpC/opcerror|grep ERROR 07/16/04 10:31:36 ERROR opctrapi (Trap Interceptor)(1907) [opcevti.cpp:1460]: Receiving SNMP PDU failed: Lost connection with pmd/ovEvent process (application disconnected). Trying to reinitialize. (OpC30-204) # /opt/OV/bin/OpC/utils/opcerr 30 204 MESSAGE OpC30-204: Receiving SNMP PDU failed: .... Trying to reinitialize. INSTRUCTION: The VPO event interceptor could not get a SNMP PDU although it was informed that there is one available. The SNMP API message <snmp-msg> gives more information. The event interceptor tries to reconnect to pmd.
OVO provides utilities out-of-the box to assist with troubleshooting. One utility, the itochecker, provides an overall check of your OVO environment. You can use the itochecker report to help isolate a problem. With the itochecker, you can generate a report that will provide important information about the state of the configuration on the management server environment. Read the man page for usage details. In this section, an example is provided of how to create a full report and a display of the results. The report provides a good overall look at the OVO environment. The primary areas of interest are any categories that show errors.
Run the report:
# /opt/
OV /contrib/OpC/itochecker a
Extract an HTML report file from the compressed tar file:
# zcat /tmp/ITO_rpt/ITO_rpt.tar.Z | tar xvf - reportl
View the report in the browser with the command
/opt/netscape/netscape reportl
This listing is the main menu of the report; the hyperlinks guide you to additional details about the specific areas in each category. Use this report to get a quick indication of any signs of trouble within the OVO environment.
ITOCHECKER REPORT Thu Feb 5 08:22:43 PST 2004 Management Server: nuema System Environment Check Name Resolution OK System Info OK Number Of Processes / System Load OK DCE Status and Patchlevel OK System file permissions OK OS Patches OKOVO EnvironmentCheck OVO Version/Package & ECS Designer N/A Server Processes OK Kernel Parameters WARNING OVO Patches OK Installed OVO filesets OK OVO Binaries: Version and Patches OK OVO Libraries: Version and Patches OK Disk space in DB and OVO Directories OK Pending Data in Distribution Directory WARNING OpCdecoded Config Files OK Cluster Information N/A core File Information OK opcinfo and opcerror Files OK Elements in Server Queues OK Elements in Agent Queues OK File permissions and ownershipOK
Database Check Database Info OK Database Queries OK
OVO Database Check Agent Entries in DB <-> Agent Entries in Filesystem OK Diskspace in Oracle Directory OK
Nodes Check Nodes Check OK Nodes Check Statistics OK
Java GUI / Service Navigator Java Version / Path OK Content of dir /opt/OV/www/htdocs/ito_op OK Content of dir /etc/opt/OV/share/conf/OpC/mgmt_sv/opcsvcm OK Check number of configured services and loggings OK Output of /opt/OV/contrib/OpC/stacktrace svc OK
Created by itochecker version A.08.00