16.2 THE PERFORMANCE AGENT
The OVPA (formerly called the MeasureWare Agent) captures performance, resource, and transaction data from managed servers or workstations. The program environment uses minimal system resources to collect, log, summarize, timestamp, detect alarm conditions and send notifications to the appropriate applications, such as OVO or NNM. It allows other programs such as OV Performance Manager, OV Reporter and Glance to utilize (extract) the data collected. OVPA utilizes data source integration (DSI) technology to receive alarm on and log data from external data sources such as applications, databases, networks, and other operating systems.
16.2.1 OVPA Installation
OVPA is distributed with OVO and requires a separate license. It is installed from the management server if you select from the menu Actions
Figure 16-6. OVPA Message Source Template Group is shown in the Message Source Template window.
[View full size image]
Figure 16-7. OVPA Message Source Template Group contains the default templates listed in the Message Source Template window.
[View full size image]
16.2.2 OVPA (3.x) Process Environment
- RPCD
HP-UX remote procedure call daemon, provides the endpoint map service for a system. The rpcd program listens on udp and tcp port 111. The endpoint map service is a system-wide database where local RPC servers register binding information associated with their interface identifiers. The endpoint map is maintained by the endpoint map service of the RPC daemon. The endpoint map services are responsible for handling RPC lookups from requesting clients of compatible locally mapped servers. This technology is being phased out of the OpenView platform and replaced by new technology, HTTPS communications programs. Refer to Chapter 14, "Agents, Policies, and Distribution," for more information about the HTTPS-based agent. - DCED (Distributed Computing Environment Daemon)Solaris (remote procedure process on SUN platforms).
- rpcss
Windows (remote procedure process on Microsoft platforms). - ovbbccb
HTTPS-based data communications process. - Perflbd
Reads the perlbd.rc file to obtain data source names and locations. perflbd starts a rep_server process for each configured data source in the perflbd.rc file. perflbd gives the client products (such as OV Performance Manager) some data communication information about the agent. Communication with perflbd is through a TCP socket. - Rep_server
Repository server process that provides access to the data stored in the logfiles. Communication with the rep_server is through RPCs. - Agdbserver
Process that provides access to the alarm generator system database. The database contains information concerning all systems that will be receiving alarms from the agent. Communication with the agdbserver is through RPCs. - Alarmgen
Process that analyzes the data and generates and sends alarm notifications to the alarm daemon in OVPM or the message interceptor in OVO or ovtrapd in NNM. - Scopeux
Collects performance data from the operating system where OVPA is installed. After collecting the data, scopeux summarizes the data and logs it in raw log files based on the specification for data collection defined in the collection parameter (parm) file. - Midaemon
Collects and counts trace data coming from the kernel and translates it for use by OVPA and other performance programs, such as Glance, via a shared memory segment. OVPA's scopeux daemon program attaches to the shared memory interface. - DSI
Data Source Integration logging daemon. - Utility
Manages scopeux log files and analyzes or checks the log files via the repository servers and alarmdef file. - Extract
mwa program for obtaining specific summary or detail data from the repositories.
NoteOVPA 4.x replaces the DCE-RPC based processes and functionality with that of HTTPs-based communications processes. Refer to Section 16.2.8, "OVPA 4.x," for more information about OVPA 4.x.
16.2.3 OVPA Startup
The perflbd.rc file is read by the perflbd program during OVPA startup and allows the selected data to be made available for alarm processing and analysis. The default perflbd.rc file contains one entry for a data source named SCOPE that starts a repository server for the scopeux log file set.The startup sequence for OVPA is as follows:
- Start scopeux (which starts midaemon if it not already running).
- Start transaction tracker (if it is not already running).
- Check for rpcd.
- Start perflbd; this starts the rep_server processes (one at a time) as requested in the perflbd.rc configuration file. Note: This can take some time if the logfiles defined for the data sources are large.
- After the rep_server processes are running, perflbd starts agdbserver.
- Abdbserver starts alarmgen.
After alarmgen is running, connections will be accepted from external programs (such as the HP OpenView Performance Manager).
16.2.4 OVPA Configuration
OVPA has a set of repository severs (called rep_servers) that provide log file data to the alarm generator and other products, such as OV Performance Manager, OVO, NNM and OV Reporter. There is one rep_server for each data source consisting of a scopeux or DSI log file set.Configure data sources in the /var/opt/perf/perflbd.rc file. A data source is identified with the following syntax within the perflbd.rc file.
The DATASOURCE line informs the alarm generator where to find the datafile; the scopeux daemon collects and summarizes performance measurements.
# cat perflbd.rc
DATASOURCE=SCOPE LOGFILE=/var/opt/perf/data/image/library/english
/10090_logglob
16.2.4.1 Data Source Log File Types
There are several data source LOGFILE types supported. The contents of the data source files are defined here for reference:
- logglob
Measurements of global system resource utilization metrics. Global records are logged every 5 minutes. - logappl
Measurements of processes in user-defined application process data. - logproc
Measurements of selected "interesting" processes. Interesting processes are tracked when they first start up, end, or exceed a user-defined threshold for CPU use. Process records are written every 60 seconds and every 5 minutes; the records in logproc are summarized according to the definitions in the parameter file and logged into the logappl file. - logdev
Measurements of individual device performance for disks and volume data, summarized every 5 minutes. - logtran
Measurements of transaction data, summarized every 5 minutes. The transaction-tracking concept is covered in Section 16.3.3 of this chapter. - logindx
Instructions on how to access data in other log files - Data Source Integration
User-defined log file (definition and Configuration covered in Section 16.2.5.2 of this chapter).
16.2.4.2 OVPA Alarm Configuration ExampleContributed by Emil Velez
The example in this section demonstrates the configure information to add to the alarmdef file in order to send performance messages to the OVO message browser if a metric threshold is violated. A brief explanation is provided with each step.
# MeasureWare format alarmdef file. DO NOT REMOVE THIS LINE!
#
# @(#) sample alarm definitions
#
# Sample alarmdef file
#
# edit any lines in this file as desired..
# First come a few sample alarms that illustrate some of the aspects of
# performance alarming.
# The following alarm, if uncommented, will go off every ten minutes:
#
#alarm GBL_CPU_TOTAL_UTIL > 0 for 10 minutes
#type = "test"
#start
# red alert "Test Alarm starting"
#repeat every 10 minutes
# yellow alert "Test Alarm continuing"
#end
# reset alert "Test Alarm ending"
#
# The following application alarm shows the use of the EXEC statement to
# execute the local action of mailing a message. Normally, if the "Other"
# application is using too much cpu, you should determine which processes
# are causing this activity and then tune your parm file so that this
# workload is bucketed into one of the application groups appropriate for
# your environment.
#alarm OTHER:APP_CPU_TOTAL_UTIL > 10 for 10 minutes
#start {
# yellow alert "Other application using more than 10 percent of the cpu"
# exec "echo 'other application using > 10% cpu' | mail root"
# }
#end
# reset alert "Other application cpu warning over"
#
# End of sample alarm section.
# Below are the primary CPU, Disk, Memory, and Network Bottleneck alarms.
# For each area, a bottleneck symptom is calculated, and the resulting
# bottleneck probability is used to define yellow or red alerts.
symptom CPU_Bottleneck type=CPU
rule GBL_CPU_TOTAL_UTIL > 75 prob 25
rule GBL_CPU_TOTAL_UTIL > 85 prob 25
rule GBL_CPU_TOTAL_UTIL > 90 prob 25
rule GBL_PRI_QUEUE > 3 prob 25
alarm CPU_Bottleneck > 50 for 5 minutes
type = "CPU"
start
if CPU_Bottleneck > 90 then
red alert "CPU Bottleneck probability= ", CPU_Bottleneck, "%"
else
yellow alert "CPU Bottleneck probability= ", CPU_Bottleneck, "%"
repeat every 10 minutes
if CPU_Bottleneck > 90 then
red alert "CPU Bottleneck probability= ", CPU_Bottleneck, "%"
else
yellow alert "CPU Bottleneck probability= ", CPU_Bottleneck, "%"
end
reset alert "End of CPU Bottleneck Alert"
symptom Disk_Bottleneck type=DISK
rule GBL_DISK_UTIL_PEAK > 50 prob GBL_DISK_UTIL_PEAK
rule GBL_DISK_SUBSYSTEM_QUEUE > 3 prob 25
alarm Disk_Bottleneck > 50 for 5 minutes
type = "Disk"
start
if Disk_Bottleneck > 90 then
red alert "Disk Bottleneck probability= ", Disk_Bottleneck, "%"
else
yellow alert "Disk Bottleneck probability= ", Disk_Bottleneck, "%"
repeat every 10 minutes
if Disk_Bottleneck > 90 then
red alert "Disk Bottleneck probability= ", Disk_Bottleneck, "%"
else
yellow alert "Disk Bottleneck probability= ", Disk_Bottleneck, "%"
end
reset alert "End of Disk Bottleneck Alert"
symptom Memory_Bottleneck type=MEMORY
rule GBL_MEM_QUEUE > 1 prob 20
rule GBL_MEM_PAGE_REQUEST_RATE > 10 prob 20
rule GBL_MEM_PAGE_REQUEST_RATE > 40 prob 20
rule GBL_MEM_PAGEOUT_RATE > 1 prob 20
rule GBL_MEM_PAGEOUT_RATE > 10 prob 35
rule GBL_MEM_SWAPOUT_RATE > 1 prob 35
rule GBL_MEM_SWAPOUT_RATE > 4 prob 50
alarm Memory_Bottleneck > 50 for 5 minutes
type = "Memory"
start
if Memory_Bottleneck > 90 then
red alert "Memory Bottleneck probability= ", Memory_Bottleneck, "%"
else
yellow alert "Memory Bottleneck probability= ", Memory_Bottleneck, "%"
repeat every 10 minutes
if Memory_Bottleneck > 90 then
red alert "Memory Bottleneck probability= ", Memory_Bottleneck, "%"
else
yellow alert "Memory Bottleneck probability= ", Memory_Bottleneck, "%"
end
reset alert "End of Memory Bottleneck Alert"
symptom Network_Bottleneck type=NETWORK
rule GBL_NFS_CALL_RATE > 100 prob 25
rule GBL_NET_COLLISION_1_MIN_RATE > 60 prob 25 # 1 per second
rule GBL_NET_COLLISION_1_MIN_RATE > 600 prob 25 # 10 per second
rule GBL_NET_COLLISION_1_MIN_RATE > 3000 prob 25 # 50 per second
rule GBL_NET_PACKET_RATE > 150 prob 10
rule GBL_NET_PACKET_RATE > 300 prob 15
rule GBL_NET_PACKET_RATE > 500 prob 25
rule GBL_NET_PACKET_RATE > 1000 prob 25
alarm Network_Bottleneck > 50 for 5 minutes
type = "Network"
start
if Network_Bottleneck > 90 then
red alert "Network Bottleneck probability= ", Network_Bottleneck, "%"
else
yellow alert "Network Bottleneck probability= ", Network_Bottleneck, "%"
repeat every 10 minutes
if Network_Bottleneck > 90 then
red alert "Network Bottleneck probability= ", Network_Bottleneck, "%"
else
yellow alert "Network Bottleneck probability= ", Network_Bottleneck, "%"
end
reset alert "End of Network Bottleneck Alert"
# The following alarm assumes that on a good network, errors are rare:
alarm GBL_NET_ERROR_1_MIN_RATE > 10
type = "Network"
start
red alert "Network error rate is greater than ten per minute"
end
reset alert "End of network error rate condition"
# Global swap space utilization alarm:
alarm GBL_SWAP_SPACE_UTIL > 95
start
red alert "Global swap space is nearly full"
end
reset alert "End of global swap space full condition"
LVOLUME loop
{
if ( lv_space_util > 80 ) then
{
if ( lv_dirname == "/var" ) then
if ( lv_space_util > 80 ) then
YELLOW ALERT "/var is greater than 80%, currently at: ",
lv_space_util
if ( lv_dirname == "/opt" ) then
if ( lv_space_util > 92 ) then
YELLOW ALERT "/opt is greater than 90%, currently at: ",lv_space_util
if ( lv_dirname == "/usr" ) then
if ( lv_space_util > 90 ) then
YELLOW ALERT "/usr is greater than 90%, currently at: ",lv_space_util
if ( lv_dirname == "/" ) then
if ( lv_space_util > 90 ) then
YELLOW ALERT "/ is greater than 90%, currently at: ",lv_space_util
if ( lv_dirname == "/home" ) then
if ( lv_space_util > 70 ) then
YELLOW ALERT "/home is greater than 70%, currently at: ",lv_space_util
if ( lv_dirname == "/opt/maestro" ) then
if ( lv_space_util > 80 ) then
YELLOW ALERT "/opt/maestro is greater than 80%, currently at: ",
lv_space_util
if ( lv_dirname == "/var/opt/perf/datafiles" ) then
if ( lv_space_util > 95 ) then
YELLOW ALERT "/var/opt/perf/datafiles is greater than 95%, currently at: ",lv_space_util
}
}
INCLUDE "/var/opt/perf/nos/nsmdnt2/alarmdef"
16.2.4.3 Examples of Measureware Extractions
vi report1
REPORT "report 1"
FORMAT ASCII
HEADINGS ON
DATA TYPE GLOBAL
DATE
TIME
GBL_ACTIVE_PROC
GBL_ALIVE_PROC
GBL_COMPLETED_PROC
GBL_CPU_CSWITCH_TIME
GBL_CPU_CSWITCH_UTIL
GBL_DISK_FS_IO
GBL_DISK_FS_IO_RATE
GBL_DISK_FS_READ
GBL_DISK_FS_READ_RATE
GBL_DISK_FS_WRITE
GBL_MEM_PAGEOUT
GBL_MEM_PAGEOUT_RATE
GBL_MEM_PAGE_REQUEST
GBL_MEM_PAGE_REQUEST_RATE
GBL_MEM_QUEUE
GBL_MEM_SWAP
vi report2
REPORT "report 2"
FORMAT ASCII
HEADINGS ON
DATA TYPE PROCESS
DATE
TIME
YEAR
PROC_CPU_CSWITCH_TIME
PROC_CPU_CSWITCH_UTIL
PROC_CPU_INTERRUPT_TIME
PROC_CPU_INTERRUPT_UTIL
PROC_CPU_NICE_TIME
PROC_CPU_NICE_UTIL
PROC_CPU_NORMAL_TIME
PROC_CPU_NORMAL_UTIL
PROC_CPU_REALTIME_TIME
PROC_CPU_REALTIME_UTIL
PROC_CPU_SYSCALL_TIME
PROC_CPU_SYSCALL_UTIL
PROC_CPU_SYS_MODE_TIME
PROC_DISK_FS_IO
PROC_DISK_FS_IO_RATE
PROC_DISK_FS_READ
PROC_PROC_NAME
PROC_RUN_TIME
PROC_SEM_WAIT_PCT
PROC_TTY
PROC_USER_NAME
# extract -xp -fd -Gg -b today-1 -e today -r report1 -f rxlog.txt
# extract -xp -fd -p -b today-1-e today-r report2 -f rxlog_proc.txt
REPORT "report 3"
FORMAT ASCII
HEADINGS ON
DATA TYPE GLOBAL
DATE
DATE_SECONDS
DAY
TIME
YEAR
GBL_ACTIVE_PROC
GBL_ALIVE_PROC
GBL_COMPLETED_PROC
GBL_CPU_HISTOGRAM
GBL_CPU_IDLE_TIME
GBL_CPU_IDLE_UTIL
GBL_CPU_INTERRUPT_TIME
GBL_CPU_INTERRUPT_UTIL
GBL_CPU_SYS_MODE_TIME
GBL_CPU_SYS_MODE_UTIL
GBL_CPU_TOTAL_TIME
GBL_CPU_TOTAL_UTIL
GBL_CPU_USER_MODE_TIME
GBL_CPU_USER_MODE_UTIL
GBL_DISK_CACHE_READ
GBL_DISK_CACHE_READ_RATE
GBL_DISK_HISTOGRAM
GBL_DISK_LOGL_READ
GBL_DISK_LOGL_READ_RATE
GBL_DISK_PHYS_BYTE
GBL_DISK_PHYS_BYTE_RATE
GBL_DISK_PHYS_IO
GBL_DISK_PHYS_IO_RATE
GBL_DISK_PHYS_READ
GBL_DISK_PHYS_READ_BYTE_RATE
GBL_DISK_PHYS_READ_RATE
GBL_DISK_PHYS_WRITE
GBL_DISK_PHYS_WRITE_BYTE_RATE
GBL_DISK_PHYS_WRITE_RATE
GBL_DISK_TIME_PEAK
GBL_DISK_UTIL_PEAK
GBL_FS_SPACE_UTIL_PEAK
GBL_MEM_CACHE_HIT_PCT
GBL_MEM_FREE_UTIL
GBL_MEM_PAGEOUT_RATE
GBL_MEM_PAGE_REQUEST
GBL_MEM_PAGE_REQUEST_RATE
GBL_MEM_SYS_AND_CACHE_UTIL
GBL_MEM_USER_UTIL
GBL_MEM_UTIL
GBL_NET_IN_PACKET
GBL_NET_IN_PACKET_RATE
GBL_NET_OUT_PACKET
GBL_NET_OUT_PACKET_RATE
GBL_NET_PACKET_RATE
GBL_NUM_NETWORK
GBL_PROC_RUN_TIME
GBL_PROC_SAMPLE
GBL_RUN_QUEUE
GBL_STARTED_PROC
GBL_SWAP_SPACE_UTIL
GBL_SYSCALL_RATE
GBL_WEB_CACHE_HIT_PCT
GBL_WEB_CGI_REQUEST_RATE
GBL_WEB_CONNECTION_RATE
GBL_WEB_files_RECEIVED_RATE
GBL_WEB_files_SENT_RATE
GBL_WEB_FTP_READ_BYTE_RATE
GBL_WEB_FTP_WRITE_BYTE_RATE
GBL_WEB_GET_REQUEST_RATE
GBL_WEB_GOPHER_READ_BYTE_RATE
GBL_WEB_GOPHER_WRITE_BYTE_RATE
GBL_WEB_HEAD_REQUEST_RATE
GBL_WEB_HTTP_READ_BYTE_RATE
GBL_WEB_HTTP_WRITE_BYTE_RATE
GBL_WEB_ISAPI_REQUEST_RATE
GBL_WEB_LOGON_FAILURES
GBL_WEB_NOT_FOUND_ERRORS
GBL_WEB_OTHER_REQUEST_RATE
GBL_WEB_POST_REQUEST_RATE
REPORT "report 4"
FORMAT ASCII
HEADINGS ON
DATA TYPE PROCESS
DATE
TIME
YEAR
PROC_PROC_NAME
PROC_APP_ID
PROC_CPU_SYS_MODE_TIME
PROC_CPU_SYS_MODE_UTIL
PROC_CPU_TOTAL_TIME
PROC_CPU_TOTAL_TIME_CUM
PROC_CPU_TOTAL_UTIL
PROC_CPU_TOTAL_UTIL_CUM
PROC_CPU_USER_MODE_TIME
PROC_CPU_USER_MODE_UTIL
PROC_INTEREST
PROC_INTERVAL_ALIVE
PROC_MEM_RES
PROC_MEM_VIRT
PROC_MINOR_FAULT
PROC_PRI
PROC_PROC_IDPROC_RUN_TIME
# extract -xp -fd -Gg -b today-1 -e today -r report3.txt -f rxlog.txt
# extract -xp -fd -p -b today-1 -e today -r report4.txt -f rxlog_proc.txt
Examples of running ovpm from command line
"c:\Program Files\HP Openview\HPOV_IOPS\cgi-bin\analyzer.exe"
-GRAPHTEMPLATE: CODA "CPU Summary" -SYSTEMNAME: r204c30 -GRAPHTYPE: TSV
"c:\Program Files\HP Openview\HPOV_IOPS\cgi-bin\analyzer.exe"
-GRAPHTEMPLATE: CODA "CPU Summary" -SYSTEMNAME: r204c30
16.2.4.3 Check the OVPA message interface to OVO
If the OVPA is installed on a managed node where OVO agents are installed, OVPA automatically sends alarms to OVO. If there is no OVO agent on the system, disable the OVO messages setup. OVPA can also send SNMP traps to NNM (agsysdb add hostname). This is configured in the alarmgen target system database. Check the configuration with the following command: /opt/perf/bin/agsysdb l (on HP-UX), /usr/lpp/perf/bin/agsysdb l (on AIX) and c:\rpmtools\bin\agsysdb l (on Windows).The output from the command will look similar to the following:
There is more detailed information on the use of this command in the man pages or in the OVPA User's Guide.
# /opt/perf/bin/agsysdb -l
MeasureWare alarming status:
SystemDB Version :
ITO messages : on Last Error : none
Exec Actions : on
16.2.5 Data Source Integration (DSI)
Use the DSI component to implement user defined data sources. For example, you may want to extract the vmstat data every 20 seconds for the User, System, and Idle statistics. The OVPA installation includes the components to check, analyze, and extract the DSI data. SPI's utilize the DSI as a method of collecting application data.The example in Section 16.2.5.1 demonstrates the steps required to configure a new data source that will send a message to the OVO message browser if a metric threshold is violated. A brief explanation is provided with each step.
16.2.5.1 Data Source Integration Example
The process to implement a DSI log includes the following steps:Create the Class Specification file
Compile the Class Specification file, and create the logfile set (three new files in the current directory).
# vi /tmp/vmstat.spec
CLASS VMSTATS = 10001;
METRICS
USER_CPU = 101
LABEL "USER_CPU";
SYSTEM_CPU = 102
LABEL "SYSTEM_CPU";
IDLE_CPU = 103
LABEL "%IDLE_CPU";
Create a format file.
# sdlcomp /tmp/vmstat.spec /tmp/vmstat.log
sdlcomp
Check class specification syntax.
CLASS VMSTATS = 10001;
METRICS
USER_CPU = 101
LABEL "USER_CPU";
SYSTEM_CPU = 102
LABEL "SYSTEM_CPU";
IDLE_CPU = 103
LABEL "IDLE_CPU";
NOTE: Time stamp inserted as first metric by default.
Syntax check successful.
Update SDL vmstat_log.
Shared memory id used by vmstat_log : 9
Class VMSTATS successfully added to logfile set.
# ls vmstat.log*
vmstat.log vmstat.log.VMSTATS
vmstat.log.desc
Note$number value discounts the first 15 fields from the vmstat output.Table 16-1 shows the vmstat output field descriptions.
# vi /tmp/vmstat.fmt
$numeric $numeric $numeric $numeric $numeric
$numeric $numeric $numeric $numeric $numeric
$numeric $numeric $numeric $numeric $numeric
USER_CPU SYSTEM_CPU IDLE_CPU
- procs: Information about numbers of processes in various states.
r In run queue
b Blocked for resources (I/O, paging, etc.)
w Runnable or short sleeper (< 20 secs) but
swapped
memory: Information about the usage of virtual and real
memory. Virtual pages are considered active if they
belong to processes that are running or have run in
the last 20 seconds.
avm Active virtual pages
free Size of the free list
page: Information about page faults and paging activeity.
These are averaged each fiveseconds, and given in
units per second.
re Page reclaims (without -S)
at Address translation faults (without -S)
si Processes swapped in (with -S)
so Processes swapped out (with -S)
pi Pages paged in
po Pages paged out
fr Pages freed per second
de Anticipated short term memory shortfall
sr Pages scanned by clock algorithm, per
second
faults: Trap/interrupt rate averages per second over last 5
seconds.
in Device interrupts per second (nonclock)
sy System calls per second
cs CPU context switch rate (switches/sec)
cpu Breakdown of percentage usage of CPU time for the
active processors
us User time for normal and low priority
processes
sy System time
id CPU idl
# vmstatprocs memory page faults cpu
r b w avm free re at pi po fr de sr in sycs us sy id
1 0 0 230390 20390 8 4 0 0 0 0 2 407 1111158 1 0 99
- Test the dsilog process:
# vmstat 20|dsilog /tmp/vmstat.log VMSTATS -f /tmp/vmstat.fmt vo
I: 1003415064 0 0 0 10594 1913 0
0 0 0 0 0 0 110
211 37 4.0000 2.0000 95.0000
I: 1003415064 0 0 0 8415 1579 0
0 0 0 0 0 0 108
144 32 2.0000 1.0000 96.0000
I: 1003415084 0 0 0 10212 1593 0
0 0 0 0 0 0 107
157 37 0.0000 1.0000 99.0000
interval marker
L: 1003414800 2.0000 1.3330 96.6660
Notes:
I: shows incoming data
L: actual data to be logged - Start the dsilog logging process:
# vmstat 20|dsilog /tmp/vmstat.log VMSTATS -f /tmp/vmstat.fmt & - View the collected DSI data:
Make the DSI a permanent data source:
extract -xp -l /var/opt/perf/vmstat_log -C VMSTATS
detail -H -fd -b first
DATASOURCE=SCOPE LOGFILE=/var/opt/perf/data/image
/library/english
/10090_logglob
DATASOURCE=DSI_VMSTAT LOGFILE=/tmp/vmstat_log - Define alarms on DSI data in the /var/opt/perf/alarmdef file:
Vi /var/opt/perf/alarmdef (partial listing)
#######DSILOG
alarm DSI_VMSTAT:VMSTATS:USER_CPU>30 for 10 minutes
start
critical alert "User CPU exceeded threshold"
repeat every 15 minutes
critical alert "User CPU exceeda threshold after 15 minutes"
end
reset alert "The User CPU Alert is over" - Customize graphs in OV Performance Manager:
16.2.5.2 Definition of Commands and Terms
- DSI
Provides the ability to collect, log, correlate, and summarize data from a variety of sources. Common DSI terms and definitions are provided here for reference. - sdlcomp
Tool that creates the DSI log file set (vmstat.log, VMSTAT_log) by reading a specification file. - Class Specification File (ASCII)
Describes the data that is collected using DSI. - Class Specification File CLASS
Defines a group of metrics (USER_CPU, SYSTEM_CPU, and IDLE_CPU) and how they are collected (for example: CLASS name VMSTATS followed by class ID is used internally by DSI; the METRICS values are assigned a unique name and number. Each metric description is terminated with a semicolon.). - Class Specification File LABEL
Identifies the set of metrics defined by the class. - Format File
Determine what data fields will appear in the final data record and excludes unnecessary information (column headings and data fields). The example format file vmstat.fmt is located in the /tmp directory along with the specification file. - Data Feed Process (dsilog)
Runs continuously in background mode, sending application output to the DSI log file (/tmp/vmstat.log). The vmstat application example shows vmstat (with the list of command line parameters ) sending data through a UNIX pipe to the dsilog command. The dsilog command line parameters include the name of the logfile set, the CLASS name, and data sent to a specific the dsilog file. Syntax checking the specification file with vo dsilog command line option sends the data only to standard output not the actual DSI log file. - Preview the data (extract)
Views the data written to the DSI log file via the extract command and writes to an ACSII output file with the name (xfrdCLASS.asc).
16.2.6 OVPA Interface with Other Programs
The Database Smart-Plug In (DB SPI) is one example of a SPI that incorporates data collection capabilities and integrates with OVPM (for graphing and analysis) using the DSI features of OVPA. Installing the DB SPI inserts new entries in the parm file to define the instances of the database as a new application class.
16.2.7 OVPA Commands and Files
- /opt/per/bin/mwa status
Checks the OVPA status. - /opt/perf/bin/mwa stop
Stops OVPA. - /opt/perf/bin/midaem T
Stops midaemon (Also stops active Glance sessions. Glance is described later in this chapter.). - / opt/perf/bin/mwa start
Starts OVPA processes, including midaemon and scopeux. - /opt/perf/bin/perfstat v
Checks the version and status of the OVPA environment. - /opt/perf/bin/ttd k
Stop the transaction tracker daemon (refer to the previous section for process description). - Parm
Contains parameters that are used to define applications and processes. - Alarmdef
Defines the conditions that generate alarms. - /var/opt/perf/perflbd.rc
Contains the startup and shutdown commands for the repository servers for each data source that has been configured. - /var/opt/perf/status.scope
Status and error log for scopeux.
The following status files contain diagnostic information from the process environment. The default file size is 1MB, and if the file grows past the limit it is renamed status.filename.old. Use these files to troubleshoot problems that may arise with the processes that generate the files:
- /var/opt/perf/status.alarmgen
- /var/opt/perf/status.perflbd
- /var/opt/perf/status.rep_server
- /var/opt/perf/status.ttd
- /var/opt/perf/status.mi
16.2.8 OVPA 4.x
OVPA 4.x is the same functionally as OVPA 3.x. Origianlly developed for the LINUX platform, OVPA 4.x replaces the DCE-RPC-based components and utilizes OVOA (coda) and the HTTPs-based daemon (ovbbccb) for data collection and communications. The OVOA replaces the functionality of the perflbd and rep-server daemons. The perflbd.rc file is replaced by a datasources file and the alarmgen process is replaced by the perfalarm daemon. Use the ovpa command (instead of mwa) to check the OVPA status. The major components of OVPA 4.x are shown in Figure 16-9.
Figure 16-9. The OVPA 4.x core component for data gathering is coda (OVOA).
[View full size image]
/var/opt/perf/status.scope/var/opt/perf/status.perfalarm/var/opt/perf/status.mi/var/opt/perf/status.ttd/var/opt/OV/log/coda.log
Metric data available from OVPA 4.x is available at: http://ovweb.external.hp.com/ovnsmdps/pdf/metlinux. Installation, release notes, user guides and other documentation is available at the Openview documentation web site: http://ovweb.external.hp.com/lpe/doc_serv/.NoteOVPA 4.x may be changed, upgraded or released by HP for other platforms in the future. Check the OpenView web site for the most up to date product information.
16.2.9 Examples Directory
Example configuration files are located in the directory /opt/perf/examples. The directory includes sample configuration and alarm definition and README files.
16.2.10 Available Metrics
There are over 1000 metrics available for collection on any given system. You can see all the metrics available system-wide with a tool like Glance. OVPA collects a subset of about ~500 metrics on the HP-UX platforms. The OVPA metrics are defined in the text document /opt/perf/paperdocs/mwa/C/methp.txt.