HP OpenView System Administration Handbook [Electronic resources] : Network Node Manager, Customer Views, Service Information Portal, HP OpenView Operations

Tammy Zitello

نسخه متنی -صفحه : 276/ 135
نمايش فراداده

16.2 THE PERFORMANCE AGENT

The OVPA (formerly called the MeasureWare Agent) captures performance, resource, and transaction data from managed servers or workstations. The program environment uses minimal system resources to collect, log, summarize, timestamp, detect alarm conditions and send notifications to the appropriate applications, such as OVO or NNM. It allows other programs such as OV Performance Manager, OV Reporter and Glance to utilize (extract) the data collected. OVPA utilizes data source integration (DSI) technology to receive alarm on and log data from external data sources such as applications, databases, networks, and other operating systems.

16.2.1 OVPA Installation

OVPA is distributed with OVO and requires a separate license. It is installed from the management server if you select from the menu

ActionsAgentInstall Subagent , then select MWA as the subagent to be installed. The agent can also be installed from a distribution CD or software depot using swinstall. After installation, the files and programs are located in the directory /opt/perf. Figures 16-6 and 16.7 show the template source for OVPA. Assign and distribute the templates to the managed node where you need to monitor up to as many as 300 metrics and take advantage of the other features offered by OVPA. The OVPA is supported on HP-UX, Solaris, Windows, Tru64, LINUX, and AIX platforms.

Figure 16-6. OVPA Message Source Template Group is shown in the Message Source Template window.

[View full size image]

Figure 16-7. OVPA Message Source Template Group contains the default templates listed in the Message Source Template window.

[View full size image]

16.2.2 OVPA (3.x) Process Environment

  • RPCD HP-UX remote procedure call daemon, provides the endpoint map service for a system. The rpcd program listens on udp and tcp port 111. The endpoint map service is a system-wide database where local RPC servers register binding information associated with their interface identifiers. The endpoint map is maintained by the endpoint map service of the RPC daemon. The endpoint map services are responsible for handling RPC lookups from requesting clients of compatible locally mapped servers. This technology is being phased out of the OpenView platform and replaced by new technology, HTTPS communications programs. Refer to Chapter 14, "Agents, Policies, and Distribution," for more information about the HTTPS-based agent.

  • DCED (Distributed Computing Environment Daemon)Solaris (remote procedure process on SUN platforms).

  • rpcss Windows (remote procedure process on Microsoft platforms).

  • ovbbccb HTTPS-based data communications process.

  • Perflbd Reads the perlbd.rc file to obtain data source names and locations. perflbd starts a rep_server process for each configured data source in the perflbd.rc file. perflbd gives the client products (such as OV Performance Manager) some data communication information about the agent. Communication with perflbd is through a TCP socket.

  • Rep_server Repository server process that provides access to the data stored in the logfiles. Communication with the rep_server is through RPCs.

  • Agdbserver Process that provides access to the alarm generator system database. The database contains information concerning all systems that will be receiving alarms from the agent. Communication with the agdbserver is through RPCs.

  • Alarmgen Process that analyzes the data and generates and sends alarm notifications to the alarm daemon in OVPM or the message interceptor in OVO or ovtrapd in NNM.

  • Scopeux Collects performance data from the operating system where OVPA is installed. After collecting the data, scopeux summarizes the data and logs it in raw log files based on the specification for data collection defined in the collection parameter (parm) file.

  • Midaemon Collects and counts trace data coming from the kernel and translates it for use by OVPA and other performance programs, such as Glance, via a shared memory segment. OVPA's scopeux daemon program attaches to the shared memory interface.

  • DSI Data Source Integration logging daemon.

  • Utility Manages scopeux log files and analyzes or checks the log files via the repository servers and alarmdef file.

  • Extract mwa program for obtaining specific summary or detail data from the repositories.

Note

OVPA 4.x replaces the DCE-RPC based processes and functionality with that of HTTPs-based communications processes. Refer to Section 16.2.8, "OVPA 4.x," for more information about OVPA 4.x.

16.2.3 OVPA Startup

The perflbd.rc file is read by the perflbd program during OVPA startup and allows the selected data to be made available for alarm processing and analysis. The default perflbd.rc file contains one entry for a data source named SCOPE that starts a repository server for the scopeux log file set.

The startup sequence for OVPA is as follows:

  • Start scopeux (which starts midaemon if it not already running).

  • Start transaction tracker (if it is not already running).

  • Check for rpcd.

  • Start perflbd; this starts the rep_server processes (one at a time) as requested in the perflbd.rc configuration file. Note: This can take some time if the logfiles defined for the data sources are large.

  • After the rep_server processes are running, perflbd starts agdbserver.

  • Abdbserver starts alarmgen.

After alarmgen is running, connections will be accepted from external programs (such as the HP OpenView Performance Manager).

16.2.4 OVPA Configuration

OVPA has a set of repository severs (called rep_servers) that provide log file data to the alarm generator and other products, such as OV Performance Manager, OVO, NNM and OV Reporter. There is one rep_server for each data source consisting of a scopeux or DSI log file set.

Configure data sources in the /var/opt/perf/perflbd.rc file. A data source is identified with the following syntax within the perflbd.rc file.

# cat perflbd.rc
DATASOURCE=SCOPE LOGFILE=/var/opt/perf/data/image/library/english
/10090_logglob

The DATASOURCE line informs the alarm generator where to find the datafile; the scopeux daemon collects and summarizes performance measurements.

16.2.4.1 Data Source Log File Types

There are several data source LOGFILE types supported. The contents of the data source files are defined here for reference:

  • logglob Measurements of global system resource utilization metrics. Global records are logged every 5 minutes.

  • logappl Measurements of processes in user-defined application process data.

  • logproc Measurements of selected "interesting" processes. Interesting processes are tracked when they first start up, end, or exceed a user-defined threshold for CPU use. Process records are written every 60 seconds and every 5 minutes; the records in logproc are summarized according to the definitions in the parameter file and logged into the logappl file.

  • logdev Measurements of individual device performance for disks and volume data, summarized every 5 minutes.

  • logtran Measurements of transaction data, summarized every 5 minutes. The transaction-tracking concept is covered in Section 16.3.3 of this chapter.

  • logindx Instructions on how to access data in other log files

  • Data Source Integration User-defined log file (definition and Configuration covered in Section 16.2.5.2 of this chapter).

16.2.4.2 OVPA Alarm Configuration ExampleContributed by Emil Velez

The example in this section demonstrates the configure information to add to the alarmdef file in order to send performance messages to the OVO message browser if a metric threshold is violated. A brief explanation is provided with each step.

# MeasureWare format alarmdef file. DO NOT REMOVE THIS LINE! # # @(#) sample alarm definitions # # Sample alarmdef file # # edit any lines in this file as desired.. # First come a few sample alarms that illustrate some of the aspects of # performance alarming. # The following alarm, if uncommented, will go off every ten minutes: # #alarm GBL_CPU_TOTAL_UTIL > 0 for 10 minutes #type = "test" #start # red alert "Test Alarm starting" #repeat every 10 minutes # yellow alert "Test Alarm continuing" #end # reset alert "Test Alarm ending" # # The following application alarm shows the use of the EXEC statement to # execute the local action of mailing a message. Normally, if the "Other" # application is using too much cpu, you should determine which processes # are causing this activity and then tune your parm file so that this # workload is bucketed into one of the application groups appropriate for # your environment. #alarm OTHER:APP_CPU_TOTAL_UTIL > 10 for 10 minutes #start { # yellow alert "Other application using more than 10 percent of the cpu" # exec "echo 'other application using > 10% cpu' | mail root" # } #end # reset alert "Other application cpu warning over" # # End of sample alarm section. # Below are the primary CPU, Disk, Memory, and Network Bottleneck alarms. # For each area, a bottleneck symptom is calculated, and the resulting # bottleneck probability is used to define yellow or red alerts. symptom CPU_Bottleneck type=CPU rule GBL_CPU_TOTAL_UTIL > 75 prob 25 rule GBL_CPU_TOTAL_UTIL > 85 prob 25 rule GBL_CPU_TOTAL_UTIL > 90 prob 25 rule GBL_PRI_QUEUE > 3 prob 25 alarm CPU_Bottleneck > 50 for 5 minutes type = "CPU" start if CPU_Bottleneck > 90 then red alert "CPU Bottleneck probability= ", CPU_Bottleneck, "%" else yellow alert "CPU Bottleneck probability= ", CPU_Bottleneck, "%" repeat every 10 minutes if CPU_Bottleneck > 90 then red alert "CPU Bottleneck probability= ", CPU_Bottleneck, "%" else yellow alert "CPU Bottleneck probability= ", CPU_Bottleneck, "%" end reset alert "End of CPU Bottleneck Alert" symptom Disk_Bottleneck type=DISK rule GBL_DISK_UTIL_PEAK > 50 prob GBL_DISK_UTIL_PEAK rule GBL_DISK_SUBSYSTEM_QUEUE > 3 prob 25 alarm Disk_Bottleneck > 50 for 5 minutes type = "Disk" start if Disk_Bottleneck > 90 then red alert "Disk Bottleneck probability= ", Disk_Bottleneck, "%" else yellow alert "Disk Bottleneck probability= ", Disk_Bottleneck, "%" repeat every 10 minutes if Disk_Bottleneck > 90 then red alert "Disk Bottleneck probability= ", Disk_Bottleneck, "%" else yellow alert "Disk Bottleneck probability= ", Disk_Bottleneck, "%" end reset alert "End of Disk Bottleneck Alert" symptom Memory_Bottleneck type=MEMORY rule GBL_MEM_QUEUE > 1 prob 20 rule GBL_MEM_PAGE_REQUEST_RATE > 10 prob 20 rule GBL_MEM_PAGE_REQUEST_RATE > 40 prob 20 rule GBL_MEM_PAGEOUT_RATE > 1 prob 20 rule GBL_MEM_PAGEOUT_RATE > 10 prob 35 rule GBL_MEM_SWAPOUT_RATE > 1 prob 35 rule GBL_MEM_SWAPOUT_RATE > 4 prob 50 alarm Memory_Bottleneck > 50 for 5 minutes type = "Memory" start if Memory_Bottleneck > 90 then red alert "Memory Bottleneck probability= ", Memory_Bottleneck, "%" else yellow alert "Memory Bottleneck probability= ", Memory_Bottleneck, "%" repeat every 10 minutes if Memory_Bottleneck > 90 then red alert "Memory Bottleneck probability= ", Memory_Bottleneck, "%" else yellow alert "Memory Bottleneck probability= ", Memory_Bottleneck, "%" end reset alert "End of Memory Bottleneck Alert" symptom Network_Bottleneck type=NETWORK rule GBL_NFS_CALL_RATE > 100 prob 25 rule GBL_NET_COLLISION_1_MIN_RATE > 60 prob 25 # 1 per second rule GBL_NET_COLLISION_1_MIN_RATE > 600 prob 25 # 10 per second rule GBL_NET_COLLISION_1_MIN_RATE > 3000 prob 25 # 50 per second rule GBL_NET_PACKET_RATE > 150 prob 10 rule GBL_NET_PACKET_RATE > 300 prob 15 rule GBL_NET_PACKET_RATE > 500 prob 25 rule GBL_NET_PACKET_RATE > 1000 prob 25 alarm Network_Bottleneck > 50 for 5 minutes type = "Network" start if Network_Bottleneck > 90 then red alert "Network Bottleneck probability= ", Network_Bottleneck, "%" else yellow alert "Network Bottleneck probability= ", Network_Bottleneck, "%" repeat every 10 minutes if Network_Bottleneck > 90 then red alert "Network Bottleneck probability= ", Network_Bottleneck, "%" else yellow alert "Network Bottleneck probability= ", Network_Bottleneck, "%" end reset alert "End of Network Bottleneck Alert" # The following alarm assumes that on a good network, errors are rare: alarm GBL_NET_ERROR_1_MIN_RATE > 10 type = "Network" start red alert "Network error rate is greater than ten per minute" end reset alert "End of network error rate condition" # Global swap space utilization alarm: alarm GBL_SWAP_SPACE_UTIL > 95 start red alert "Global swap space is nearly full" end reset alert "End of global swap space full condition" LVOLUME loop { if ( lv_space_util > 80 ) then { if ( lv_dirname == "/var" ) then if ( lv_space_util > 80 ) then YELLOW ALERT "/var is greater than 80%, currently at: ", lv_space_util if ( lv_dirname == "/opt" ) then if ( lv_space_util > 92 ) then YELLOW ALERT "/opt is greater than 90%, currently at: ",lv_space_util if ( lv_dirname == "/usr" ) then if ( lv_space_util > 90 ) then YELLOW ALERT "/usr is greater than 90%, currently at: ",lv_space_util if ( lv_dirname == "/" ) then if ( lv_space_util > 90 ) then YELLOW ALERT "/ is greater than 90%, currently at: ",lv_space_util if ( lv_dirname == "/home" ) then if ( lv_space_util > 70 ) then YELLOW ALERT "/home is greater than 70%, currently at: ",lv_space_util if ( lv_dirname == "/opt/maestro" ) then if ( lv_space_util > 80 ) then YELLOW ALERT "/opt/maestro is greater than 80%, currently at: ", lv_space_util if ( lv_dirname == "/var/opt/perf/datafiles" ) then if ( lv_space_util > 95 ) then YELLOW ALERT "/var/opt/perf/datafiles is greater than 95%, currently at: " ,lv_space_util } } INCLUDE "/var/opt/perf/nos/nsmdnt2/alarmdef"

16.2.4.3 Examples of Measureware Extractions

vi report1
REPORT "report 1"
FORMAT ASCII
HEADINGS ON
DATA TYPE GLOBAL
DATE
TIME
GBL_ACTIVE_PROC
GBL_ALIVE_PROC
GBL_COMPLETED_PROC
GBL_CPU_CSWITCH_TIME
GBL_CPU_CSWITCH_UTIL
GBL_DISK_FS_IO
GBL_DISK_FS_IO_RATE
GBL_DISK_FS_READ
GBL_DISK_FS_READ_RATE
GBL_DISK_FS_WRITE
GBL_MEM_PAGEOUT
GBL_MEM_PAGEOUT_RATE
GBL_MEM_PAGE_REQUEST
GBL_MEM_PAGE_REQUEST_RATE
GBL_MEM_QUEUE
GBL_MEM_SWAP
vi report2
REPORT "report 2"
FORMAT ASCII
HEADINGS ON
DATA TYPE PROCESS
DATE
TIME
YEAR
PROC_CPU_CSWITCH_TIME
PROC_CPU_CSWITCH_UTIL
PROC_CPU_INTERRUPT_TIME
PROC_CPU_INTERRUPT_UTIL
PROC_CPU_NICE_TIME
PROC_CPU_NICE_UTIL
PROC_CPU_NORMAL_TIME
PROC_CPU_NORMAL_UTIL
PROC_CPU_REALTIME_TIME
PROC_CPU_REALTIME_UTIL
PROC_CPU_SYSCALL_TIME
PROC_CPU_SYSCALL_UTIL
PROC_CPU_SYS_MODE_TIME
PROC_DISK_FS_IO
PROC_DISK_FS_IO_RATE
PROC_DISK_FS_READ
PROC_PROC_NAME
PROC_RUN_TIME
PROC_SEM_WAIT_PCT
PROC_TTY
PROC_USER_NAME
# extract -xp -fd -Gg  -b today-1 -e today  -r report1 -f rxlog.txt
# extract -xp -fd -p -b today-1-e today-r report2 -f rxlog_proc.txt
REPORT "report 3"
FORMAT ASCII
HEADINGS ON
DATA TYPE GLOBAL
DATE
DATE_SECONDS
DAY
TIME
YEAR
GBL_ACTIVE_PROC
GBL_ALIVE_PROC
GBL_COMPLETED_PROC
GBL_CPU_HISTOGRAM
GBL_CPU_IDLE_TIME
GBL_CPU_IDLE_UTIL
GBL_CPU_INTERRUPT_TIME
GBL_CPU_INTERRUPT_UTIL
GBL_CPU_SYS_MODE_TIME
GBL_CPU_SYS_MODE_UTIL
GBL_CPU_TOTAL_TIME
GBL_CPU_TOTAL_UTIL
GBL_CPU_USER_MODE_TIME
GBL_CPU_USER_MODE_UTIL
GBL_DISK_CACHE_READ
GBL_DISK_CACHE_READ_RATE
GBL_DISK_HISTOGRAM
GBL_DISK_LOGL_READ
GBL_DISK_LOGL_READ_RATE
GBL_DISK_PHYS_BYTE
GBL_DISK_PHYS_BYTE_RATE
GBL_DISK_PHYS_IO
GBL_DISK_PHYS_IO_RATE
GBL_DISK_PHYS_READ
GBL_DISK_PHYS_READ_BYTE_RATE
GBL_DISK_PHYS_READ_RATE
GBL_DISK_PHYS_WRITE
GBL_DISK_PHYS_WRITE_BYTE_RATE
GBL_DISK_PHYS_WRITE_RATE
GBL_DISK_TIME_PEAK
GBL_DISK_UTIL_PEAK
GBL_FS_SPACE_UTIL_PEAK
GBL_MEM_CACHE_HIT_PCT
GBL_MEM_FREE_UTIL
GBL_MEM_PAGEOUT_RATE
GBL_MEM_PAGE_REQUEST
GBL_MEM_PAGE_REQUEST_RATE
GBL_MEM_SYS_AND_CACHE_UTIL
GBL_MEM_USER_UTIL
GBL_MEM_UTIL
GBL_NET_IN_PACKET
GBL_NET_IN_PACKET_RATE
GBL_NET_OUT_PACKET
GBL_NET_OUT_PACKET_RATE
GBL_NET_PACKET_RATE
GBL_NUM_NETWORK
GBL_PROC_RUN_TIME
GBL_PROC_SAMPLE
GBL_RUN_QUEUE
GBL_STARTED_PROC
GBL_SWAP_SPACE_UTIL
GBL_SYSCALL_RATE
GBL_WEB_CACHE_HIT_PCT
GBL_WEB_CGI_REQUEST_RATE
GBL_WEB_CONNECTION_RATE
GBL_WEB_files_RECEIVED_RATE
GBL_WEB_files_SENT_RATE
GBL_WEB_FTP_READ_BYTE_RATE
GBL_WEB_FTP_WRITE_BYTE_RATE
GBL_WEB_GET_REQUEST_RATE
GBL_WEB_GOPHER_READ_BYTE_RATE
GBL_WEB_GOPHER_WRITE_BYTE_RATE
GBL_WEB_HEAD_REQUEST_RATE
GBL_WEB_HTTP_READ_BYTE_RATE
GBL_WEB_HTTP_WRITE_BYTE_RATE
GBL_WEB_ISAPI_REQUEST_RATE
GBL_WEB_LOGON_FAILURES
GBL_WEB_NOT_FOUND_ERRORS
GBL_WEB_OTHER_REQUEST_RATE
GBL_WEB_POST_REQUEST_RATE
REPORT "report 4"
FORMAT ASCII
HEADINGS ON
DATA TYPE PROCESS
DATE
TIME
YEAR
PROC_PROC_NAME
PROC_APP_ID
PROC_CPU_SYS_MODE_TIME
PROC_CPU_SYS_MODE_UTIL
PROC_CPU_TOTAL_TIME
PROC_CPU_TOTAL_TIME_CUM
PROC_CPU_TOTAL_UTIL
PROC_CPU_TOTAL_UTIL_CUM
PROC_CPU_USER_MODE_TIME
PROC_CPU_USER_MODE_UTIL
PROC_INTEREST
PROC_INTERVAL_ALIVE
PROC_MEM_RES
PROC_MEM_VIRT
PROC_MINOR_FAULT
PROC_PRI
PROC_PROC_IDPROC_RUN_TIME
# extract -xp -fd -Gg  -b today-1 -e today  -r report3.txt -f rxlog.txt
# extract -xp -fd -p  -b today-1 -e today  -r report4.txt -f rxlog_proc.txt
Examples of running ovpm from command line
"c:\Program Files\HP Openview\HPOV_IOPS\cgi-bin\analyzer.exe"
-GRAPHTEMPLATE: CODA "CPU Summary" -SYSTEMNAME: r204c30 -GRAPHTYPE: TSV
"c:\Program Files\HP Openview\HPOV_IOPS\cgi-bin\analyzer.exe"
-GRAPHTEMPLATE: CODA "CPU Summary" -SYSTEMNAME: r204c30

16.2.4.3 Check the OVPA message interface to OVO

If the OVPA is installed on a managed node where OVO agents are installed, OVPA automatically sends alarms to OVO. If there is no OVO agent on the system, disable the OVO messages setup. OVPA can also send SNMP traps to NNM (agsysdb add hostname). This is configured in the alarmgen target system database. Check the configuration with the following command:

/opt/perf/bin/agsysdb l (on HP-UX), /usr/lpp/perf/bin/agsysdb l (on AIX) and

c:\rpmtools\bin\agsysdb l (on Windows).

The output from the command will look similar to the following:

# /opt/perf/bin/agsysdb -l
MeasureWare alarming status:
SystemDB Version :
ITO messages : on      Last Error : none
Exec Actions : on

There is more detailed information on the use of this command in the man pages or in the OVPA User's Guide.

16.2.5 Data Source Integration (DSI)

Use the DSI component to implement user defined data sources. For example, you may want to extract the vmstat data every 20 seconds for the User, System, and Idle statistics. The OVPA installation includes the components to check, analyze, and extract the DSI data. SPI's utilize the DSI as a method of collecting application data.

The example in Section 16.2.5.1 demonstrates the steps required to configure a new data source that will send a message to the OVO message browser if a metric threshold is violated. A brief explanation is provided with each step.

16.2.5.1 Data Source Integration Example

The process to implement a DSI log includes the following steps:

Create the Class Specification file

# vi /tmp/vmstat.spec
CLASS VMSTATS = 10001;
METRICS
USER_CPU = 101
LABEL "USER_CPU";
SYSTEM_CPU = 102
LABEL "SYSTEM_CPU";
IDLE_CPU = 103
LABEL "%IDLE_CPU";

Compile the Class Specification file, and create the logfile set (three new files in the current directory).

# sdlcomp /tmp/vmstat.spec /tmp/vmstat.log
sdlcomp
Check class specification syntax.
CLASS VMSTATS = 10001;
METRICS
USER_CPU = 101
LABEL "USER_CPU";
SYSTEM_CPU = 102
LABEL "SYSTEM_CPU";
IDLE_CPU = 103
LABEL "IDLE_CPU";
NOTE: Time stamp inserted as first metric by default.
Syntax check successful.
Update SDL vmstat_log.
Shared memory id used by vmstat_log : 9
Class VMSTATS successfully added to logfile set.
# ls vmstat.log*
vmstat.log          vmstat.log.VMSTATS
vmstat.log.desc

Create a format file.

# vi /tmp/vmstat.fmt
$numeric $numeric $numeric $numeric $numeric
$numeric $numeric $numeric $numeric $numeric
$numeric $numeric $numeric $numeric $numeric
USER_CPU SYSTEM_CPU IDLE_CPU

Note

$number value discounts the first 15 fields from the vmstat output.

Table 16-1 shows the

vmstat output field descriptions.

Table 16-1. vmstat Command Field Descriptions

Primary Field

Secondary Fields

procs : Information about numbers of processes in various states.

R In run queue

b Blocked for resources (I/O, paging, and so on)

w Runnable or short sleeper (< 20 secs) but swapped

memory : Information about the usage of virtual and real memory. Virtual pages are considered active if they belong to processes that are running or have run in the last 20 seconds.

avm Active virtual pages

free Size of the free list

page : Information about page faults and paging activity. These are averaged each five seconds, and given in units per second.

re Page reclaims (without -S)

at Address translation faults (without -S)

si Processes swapped in (with -S)

so Processes swapped out (with -S)

pi Pages paged in

po Pages paged out

fr Pages freed per second

de Anticipated short-term memory shortfall

sr Pages scanned by clock algorithm, per second

faults : Trap/interrupt rate averages per second over last 5 seconds.

in Device interrupts per second (nonclock)

sy System calls per second

cs CPU context switch rate (switches/sec)

cpu : Breakdown of percentage usage of CPU time for the active processors

us User time for normal and low priority processes

sy System time

id CPU idle

The vmstat command Column Descriptions (Alternate format)

The column headings and the meaning of each column are:

  1. procs: Information about numbers of processes in various states.

    r In run queue b Blocked for resources (I/O, paging, etc.) w Runnable or short sleeper (< 20 secs) but swapped memory: Information about the usage of virtual and real memory. Virtual pages are considered active if they belong to processes that are running or have run in the last 20 seconds. avm Active virtual pages free Size of the free list page: Information about page faults and paging activeity. These are averaged each five seconds, and given in units per second. re Page reclaims (without -S) at Address translation faults (without -S) si Processes swapped in (with -S) so Processes swapped out (with -S) pi Pages paged in po Pages paged out fr Pages freed per second de Anticipated short term memory shortfall sr Pages scanned by clock algorithm, per second faults: Trap/interrupt rate averages per second over last 5 seconds. in Device interrupts per second (nonclock) sy System calls per second cs CPU context switch rate (switches/sec) cpu Breakdown of percentage usage of CPU time for the active processors us User time for normal and low priority processes sy System time id CPU idl # vmstat

    procs memory page faults cpu r b w avm free re at pi po fr de sr in sy cs us sy id 1 0 0 230390 20390 8 4 0 0 0 0 2 407 1111 158 1 0 99

  2. Test the dsilog process:

    
    # vmstat 20|dsilog /tmp/vmstat.log VMSTATS -f /tmp/vmstat.fmt vo
    I: 1003415064     0     0     0   10594    1913      0
    0     0     0     0     0      0    110
    211  37   4.0000  2.0000    95.0000
    I: 1003415064     0     0    0   8415   1579     0
    0   0    0    0    0     0   108
    144   32  2.0000   1.0000  96.0000
    I: 1003415084   0   0  0 10212   1593          0
    0      0      0      0       0    0        107
    157         37     0.0000     1.0000    99.0000
    interval marker
    L: 1003414800     2.0000     1.3330    96.6660
    Notes:
    I: shows incoming data
    L: actual data to be logged
    

  3. Start the dsilog logging process:

    # vmstat 20|dsilog /tmp/vmstat.log VMSTATS -f /tmp/vmstat.fmt &
    

  4. View the collected DSI data:

    extract -xp -l /var/opt/perf/vmstat_log -C VMSTATS
     detail -H -fd -b first
    

    Make the DSI a permanent data source:

    DATASOURCE=SCOPE LOGFILE=/var/opt/perf/data/image
    /library/english
    /10090_logglob
    DATASOURCE=DSI_VMSTAT LOGFILE=/tmp/vmstat_log
    

  5. Define alarms on DSI data in the /var/opt/perf/alarmdef file:

    Vi /var/opt/perf/alarmdef (partial listing)
    #######DSILOG
    alarm DSI_VMSTAT:VMSTATS:USER_CPU>30 for 10 minutes
    start
    critical alert "User CPU exceeded threshold"
    repeat every 15 minutes
    critical alert "User CPU exceeda threshold after 15 minutes"
    end
    reset alert "The User CPU Alert is over"
    

  6. Customize graphs in OV Performance Manager:

Refer to the OpenView Performance Manager Documentation for specific implementation and customization details.

16.2.5.2 Definition of Commands and Terms
  • DSI Provides the ability to collect, log, correlate, and summarize data from a variety of sources. Common DSI terms and definitions are provided here for reference.

  • sdlcomp Tool that creates the DSI log file set (vmstat.log, VMSTAT_log) by reading a specification file.

  • Class Specification File (ASCII) Describes the data that is collected using DSI.

  • Class Specification File CLASS Defines a group of metrics (USER_CPU, SYSTEM_CPU, and IDLE_CPU) and how they are collected (for example: CLASS name VMSTATS followed by class ID is used internally by DSI; the METRICS values are assigned a unique name and number. Each metric description is terminated with a semicolon.).

  • Class Specification File LABEL Identifies the set of metrics defined by the class.

  • Format File Determine what data fields will appear in the final data record and excludes unnecessary information (column headings and data fields). The example format file vmstat.fmt is located in the /tmp directory along with the specification file.

  • Data Feed Process (dsilog) Runs continuously in background mode, sending application output to the DSI log file (/tmp/vmstat.log). The vmstat application example shows vmstat (with the list of command line parameters ) sending data through a UNIX pipe to the

    dsilog command. The

    dsilog command line parameters include the name of the logfile set, the CLASS name, and data sent to a specific the dsilog file. Syntax checking the specification file with

    vo dsilog command line option sends the data only to standard output not the actual DSI log file.

  • Preview the data (extract) Views the data written to the DSI log file via the

    extract command and writes to an ACSII output file with the name (xfrdCLASS.asc).

16.2.6 OVPA Interface with Other Programs

The Database Smart-Plug In (DB SPI) is one example of a SPI that incorporates data collection capabilities and integrates with OVPM (for graphing and analysis) using the DSI features of OVPA. Installing the DB SPI inserts new entries in the parm file to define the instances of the database as a new application class.

16.2.7 OVPA Commands and Files

  • /opt/per/bin/mwa status Checks the OVPA status.

  • /opt/perf/bin/mwa stop Stops OVPA.

  • /opt/perf/bin/midaem T Stops midaemon (Also stops active Glance sessions. Glance is described later in this chapter.).

  • /

    opt/perf/bin/mwa start Starts OVPA processes, including midaemon and scopeux.

  • /opt/perf/bin/perfstat v Checks the version and status of the OVPA environment.

  • /opt/perf/bin/ttd k Stop the transaction tracker daemon (refer to the previous section for process description).

  • Parm Contains parameters that are used to define applications and processes.

  • Alarmdef Defines the conditions that generate alarms.

  • /var/opt/perf/perflbd.rc Contains the startup and shutdown commands for the repository servers for each data source that has been configured.

  • /var/opt/perf/status.scope Status and error log for scopeux.

The following status files contain diagnostic information from the process environment. The default file size is 1MB, and if the file grows past the limit it is renamed status.filename.old. Use these files to troubleshoot problems that may arise with the processes that generate the files:

  • /var/opt/perf/status.alarmgen

  • /var/opt/perf/status.perflbd

  • /var/opt/perf/status.rep_server

  • /var/opt/perf/status.ttd

  • /var/opt/perf/status.mi

16.2.8 OVPA 4.x

OVPA 4.x is the same functionally as OVPA 3.x. Origianlly developed for the LINUX platform, OVPA 4.x replaces the DCE-RPC-based components and utilizes OVOA (coda) and the HTTPs-based daemon (ovbbccb) for data collection and communications. The OVOA replaces the functionality of the perflbd and rep-server daemons. The perflbd.rc file is replaced by a datasources file and the alarmgen process is replaced by the perfalarm daemon. Use the

ovpa command (instead of mwa) to check the OVPA status. The major components of OVPA 4.x are shown in Figure 16-9.

Figure 16-9. The OVPA 4.x core component for data gathering is coda (OVOA).

[View full size image]

The following status files contain diagnostic information from the process environment. The default file size is 1MB, and if the file grows past the limit it is renamed status.filename.old. Use these files to troubleshoot problems that may arise with the processes that generate the files:

/var/opt/perf/status.scope

/var/opt/perf/status.perfalarm

/var/opt/perf/status.mi

/var/opt/perf/status.ttd

/var/opt/OV/log/coda.log

Metric data available from OVPA 4.x is available at: http://ovweb.external.hp.com/ovnsmdps/pdf/metlinux. Installation, release notes, user guides and other documentation is available at the Openview documentation web site: http://ovweb.external.hp.com/lpe/doc_serv/.

Note

OVPA 4.x may be changed, upgraded or released by HP for other platforms in the future. Check the OpenView web site for the most up to date product information.

16.2.9 Examples Directory

Example configuration files are located in the directory /opt/perf/examples. The directory includes sample configuration and alarm definition and README files.

16.2.10 Available Metrics

There are over 1000 metrics available for collection on any given system. You can see all the metrics available system-wide with a tool like Glance. OVPA collects a subset of about ~500 metrics on the HP-UX platforms. The OVPA metrics are defined in the text document /opt/perf/paperdocs/mwa/C/methp.txt.