This chapter provides an introduction to the design of the trace Performance Metrics Domain Agent (PMDA), in an effort to explain how to configure the agent optimally for a particular problem domain. This information supplements the functional coverage which the man pages provide to both the agent and the library interfaces.
The chapter also includes information on how to use the trace PMDA and the associated library (libpcp_trace) for instrumenting applications. The example programs in IRIX are installed in /var/pcp/demos/trace from the pcp.sw.trace subsystem. On Linux, the trace PMDA exists in the pcp-2.3-rev package at /usr/share/pcp/demo/trace.
The pcp_trace library provides function calls for identifying sections of a program as transactions or events for examination by the trace PMDA, a user command called pmdatrace. The pcp_trace library is described in the pmdatrace(3) man page
The monitoring of transactions using the Performance Co-Pilot (PCP) infrastructure begins with a pmtracebegin call. Time is recorded from there to the corresponding pmtraceend call (with matching tag identifier). A transaction in progress can be cancelled by calling pmtraceabort.
A second form of program instrumentation is available with the pmtracepoint function. This is a simpler form of monitoring that exports only the number of times a particular point in a program is passed. The pmtraceobs and pmtracecount functions have similar semantics, but allows an arbitrary numeric value to be passed to the trace PMDA.
For a complete introduction to performance tracing, refer to the Web-based PCP Tutorial, which contains the trace.html file covering this topic.
Figure 4-1 describes the general state maintained within the trace PMDA.
Applications that are linked with the libpcp_trace library make calls through the trace Application Programming Interface (API). These calls result in interprocess communication of trace data between the application and the trace PMDA. This data consists of an identification tag and the performance data associated with that particular tag. The trace PMDA aggregates the incoming information and periodically updates the exported summary information to describe activity in the recent past.
As each protocol data unit (PDU) is received, its data is stored in the current working buffer. At the same time, the global counter associated with the particular tag contained within the PDU is incremented. The working buffer contains all performance data that has arrived since the previous time interval elapsed. For additional information about the working buffer, see Section 126.96.36.199.
The trace PMDA employs a rolling-window periodic sampling technique. The arrival time of the data at the trace PMDA in conjunction with the length of the sampling period being maintained by the PMDA determines the recency of the data exported by the PMDA. Through the use of rolling-window sampling, the trace PMDA is able to present a more accurate representation of the available trace data at any given time than it could through use of simple periodic sampling.
The rolling-window sampling technique affects the metrics in Example 4-1:
trace.observe.rate trace.counter.rate trace.point.rate trace.transact.ave_time trace.transact.max_time trace.transact.min_time trace.transact.rate
The remaining metrics are either global counters, control metrics, or the last seen observation value. Section 4.3, documents in more detail all metrics exported by the trace PMDA.
The simple periodic sampling technique uses a single historical buffer to store the history of events that have occurred over the sampling interval. As events occur, they are recorded in the working buffer. At the end of each sampling interval, the working buffer (which at that time holds the historical data for the sampling interval just finished) is copied into the historical buffer, and the working buffer is cleared. It is ready to hold new events from the sampling interval now starting.
In contrast to simple periodic sampling with its single historical buffer, the rolling-window periodic sampling technique maintains a number of separate buffers. One buffer is marked as the current working buffer, and the remainder of the buffers hold historical data. As each event occurs, the current working buffer is updated to reflect it.
At a specified interval, the current working buffer and the accumulated data that it holds is moved into the set of historical buffers, and a new working buffer is used. The specified interval is a function of the number of historical buffers maintained.
The primary advantage of the rolling-window sampling technique is seen at the point where data is actually exported. At this point, the data has a higher probability of reflecting a more recent sampling period than the data exported using simple periodic sampling.
The data collected over each sample duration and exported using the rolling-window sampling technique provides a more up-to-date representation of the activity during the most recently completed sample duration than simple periodic sampling as shown in Figure 4-2.
The trace PMDA allows the length of the sample duration to be configured, as well as the number of historical buffers that are maintained. The rolling-window approach is implemented in the trace PMDA as a ring buffer (see Figure 4-1).
Consider the scenario where you want to know the rate of transactions over the last 10 seconds. You set the sampling rate for the trace PMDA to 10 seconds and fetch the metric trace.transact.rate. So if in the last 10 seconds, 8 transactions took place, the transaction rate would be 8/10 or 0.8 transactions per second.
The trace PMDA does not actually do this. It instead does its calculations automatically at a subinterval of the sampling interval. Reconsider the 10-second scenario. It has a calculation subinterval of 2 seconds as shown in Figure 4-3.
If at 13.5 seconds, you request the transaction rate, you receive a value of 0.7 transactions per second. In actual fact, the transaction rate was 0.8, but the trace PMDA did its calculations on the sampling interval from 2 seconds to 12 seconds, and not from 3.5 seconds to 13.5 seconds. For efficiency, the trace PMDA calculates the metrics on the last 10 seconds every 2 seconds. As a result, the PMDA is not driven each time a fetch request is received to do a calculation.
The trace PMDA is configurable primarily through command-line options. The list of command-line options in Table 4-1 is not exhaustive, but it identifies those options which are particularly relevant to tuning the manner in which performance data is collected.
The trace PMDA offers host-based access control. This control allows and disallows connections from instrumented applications running on specified hosts or groups of hosts. Limits to the number of connections allowed from individual hosts can also be mandated.
The interval over which metrics are to be maintained before being discarded is called the sample duration.
The data maintained for the sample duration is held in a number of internal buffers within the trace PMDA. These are referred to as historical buffers. This number is configurable so that the rolling window effect can be tuned within the sample duration.
Since the data being exported by the trace.observe.value and trace.counter.count metrics are user-defined, the trace PMDA by default exports these metrics with a type of “none.” A framework is provided that allows the user to make the type more specific (for example, bytes per second) and allows the exported values to be plotted along with other performance metrics of similar units by tools like pmchart.
The libpcp_trace Application Programming Interface (API) is called from C, C++, Fortran, and Java. Each language has access to the complete set of functionality offered by libpcp_trace. In some cases, the calling conventions differ slightly between languages. This section presents an overview of each of the different tracing mechanisms offered by the API, as well as an explanation of their mappings to the actual performance metrics exported by the trace PMDA.
Paired calls to the pmtracebegin and pmtraceend API functions result in transaction data being sent to the trace PMDA with a measure of the time interval between the two calls. This interval is the transaction service time. Using the pmtraceabort call causes data for that particular transaction to be discarded. The trace PMDA exports transaction data through the following trace.transact metrics listed in Table 4-2:
The average service time per transaction type. This time is calculated over the last sample duration.
The running count for each transaction type seen since the trace PMDA started.
The maximum service time per transaction type within the last sample duration.
The minimum service time per transaction type within the last sample duration.
The average rate at which each transaction type is completed. The rate is calculated over the last sample duration.
The cumulative time spent processing each transaction since the trace PMDA started running.
Point tracing allows the application programmer to export metrics related to salient events. The pmtracepoint function is most useful when start and end points are not well defined. For example, this function is useful when the code branches in such a way that a transaction cannot be clearly identified, or when processing does not follow a transactional model, or when the desired instrumentation is akin to event rates rather than event service times. This data is exported through the trace.point metrics listed in Table 4-3:
Running count of point observations for each tag seen since the trace PMDA started.
The average rate at which observation points occur for each tag within the last sample duration.
The pmtraceobs and pmtracecount functions have similar semantics to pmtracepoint, but also allow an arbitrary numeric value to be passed to the trace PMDA. The most recent value for each tag is then immediately available from the PMDA. Observation data is exported through the trace.observe metrics listed in Table 4-4:
Running count of observations seen since the trace PMDA started.
The average rate at which observations for each tag occur. This rate is calculated over the last sample duration.
The numeric value associated with the observation last seen by the trace PMDA.
Counter data is exported through the trace.counter metrics. The only difference between trace.counter and trace.observe metrics is that the numeric value of trace.counter must be a monotonic increasing count.
The trace library is configurable through the use of environment variables listed in Table 4-5 as well as through the state flags listed in Table 4-6. Both provide diagnostic output and enable or disable the configurable functionality within the library.
The name of the host where the trace PMDA is running.
The number of seconds to wait until assuming that the initial connection is not going to be made, and timeout will occur. The default is three seconds.
The number of seconds to allow before timing out on awaiting acknowledgment from the trace PMDA after trace data has been sent to it. This variable has no effect in the asynchronous trace protocol (refer to Table 4-6).
A list of values which represents the backoff approach that the libpcp_trace library routines take when attempting to reconnect to the trace PMDA after a connection has been lost. The list of values should be a positive number of seconds for the application to delay before making the next reconnection attempt. When the final value in the list is reached, that value is used for all subsequent reconnection attempts.
The Table 4-6 are used to customize the operation of the libpcp_trace routines. These are registered through the pmtracestate call, and they can be set either individually or together.
The default. No state flags have been set, the fault-tolerant, synchronous protocol is used for communicating with the trace PMDA, and no diagnostic messages are displayed by the libpcp_trace routines.
High-level diagnostics. This flag simply displays entry into each of the API routines.
Diagnostic messages related to establishing and maintaining the communication channel between application and PMDA.
The full contents of the PDU buffers are dumped as PDUs are transmitted and received.
Interprocess communication control. If this flag is set, it causes interprocess communication between the instrumented application and the trace PMDA to be skipped. This flag is a debugging aid for applications using libpcp_trace.
Asynchronous trace protocol. This flag enables the asynchronous trace protocol so that the application does not block awaiting acknowledgment PDUs from the trace PMDA. In order for the flag to be effective, it must be set before using the other libpcp_trace entry points.
The relationship between an application, the libpcp_trace library, the trace PMDA and the rest of the PCP infrastructure is shown in Figure 4-4:
The libpcp_trace library is designed to encourage application developers (independent software vendors and end-user customers) to embed calls in their code that enable application performance data to be exported. When combined with system-level performance data, this feature allows total performance and resource demands of an application to be correlated with application activity.
For example, developers can provide the following application performance metrics:
The libpcp_trace library approach offers a number of attractive features:
A simple API for inserting instrumentation calls into an application as shown in the following example:
pmtracebegin(“pass 1”); ... pmtraceend(“pass 1”); ... pmtraceobs(“threads”, N);
Shipped source code for a stub version of the library that enables the following:
Replacement by private debugging or development versions
Flexibility based on not being locked into a SGI program
Added functionality on SGI platforms, when the PCP version of the library is present
A PCP version of the library that allows numerical observations, measures time between matching begin-end calls, and so on to be shipped to a PCP agent and then exported into the PCP infrastructure. As exporting is controlled by environment variables, the overhead is very low if the metrics are not being exported.
Once the application performance metrics are exported into the PCP framework, all of the PCP tools may be leveraged to provide performance monitoring and management, including:
Two- and three-dimensional visualization of resource demands and performance, showing concurrent system activity and application activity.
|Note: On Linux, visualization tools are not provided as part of the PCP for IA-64 Linux distribution.|