|
Home · Charts · Time Control |
Performance Co-Pilot (PCP) is an open source framework and toolkit for monitoring, analyzing, and responding to details of live and historical system performance. PCP has a fully distributed, plug-in based architecture making it particularly well suited to centralized analysis of complex environments and systems. Custom performance metrics can be added using the C, C++, Perl, and Python interfaces.
This page provides quick instructions how to install and use PCP on a set of hosts of which one (a monitor host) will be used for monitoring and analyzing itself and other hosts (collector hosts).
PCP is available on all recent Linux distribution releases, including Debian/Fedora/RHEL/SUSE/Ubuntu. For other operating systems and distributions you might want to consider installation from sources.
# yum install pcp # or apt-get or dnf or zypper # systemctl enable pmcd # systemctl start pmcd # systemctl enable pmlogger # systemctl start pmlogger |
Here we enable the Performance Metrics Collector Daemon (pmcd(1)) on the host which then in turn will control and request metrics on behalf of clients from various Performance Metrics Domain Agents (PMDAs). The PMDAs provide the actual data from different components (domains) in the system, for example from the Linux Kernel PMDA or the NFS Client PMDA. The default configuration includes over 1000 metrics with negligible overall overhead when queried. If no queries for metrics are sent to the agent, it doesn't do anything at all. Local PCP archive logs will also be enabled on the host for convenience with pmlogger(1).
# cd /var/lib/pcp/pmdas/postgresql # ./Install |
The client tools will contact local or remote PMCDs as needed, communication with PMCD over the network uses TCP port 44321 by default.
The following additional packages can be optionally installed on the monitoring host to extend the set of monitoring tools from the base pcp package.
# yum install pcp-doc pcp-gui pcp-system-tools # or apt-get or dnf or zypper |
To enable centralized archive log collection on the monitoring host, its pmlogger is configured to fetch performance metrics from collector hosts. Add each collector host to the pmlogger configuration file /etc/pcp/pmlogger/control and then restart the pmlogger service on the monitoring host.
# echo acme.com n n PCP_LOG_DIR/pmlogger/acme.com -r -T24h10m -c config.acme.com >> /etc/pcp/pmlogger/control # systemctl restart pmlogger |
The health of the remote log collector will be done every half an hour. You can also run /usr/libexec/pcp/bin/pmlogger_check -V -C (on Fedora/RHEL) or /usr/lib/pcp/bin/pmlogger_check -V -C (on Debian/Ubuntu) manually to do a health check.
Note that a default configuration file (config.acme.com above) will be generated if it does not exist already. This process is optional (a custom configuration for each host can be provided instead), see the pmlogconf(1) manual page for details on this.
In dynamic environments manually configuring every host is not feasible, perhaps even impossible. The discovery service (pmfind(1) can be used to auto-discover and auto-configure new collector hosts and containers for logging and/or rule inference.
# systemctl enable pmfind # systemctl restart pmfind |
$ pmfind -s pmcd |
Basic health check for running services, network connectivity between hosts, and enabled PMDAs can be done simply as follows.
$ pcp -h munch
Performance Co-Pilot configuration on munch:
platform: SunOS munch 5.11 oi_151a8 i86pc
hardware: 4 cpus, 3 disks, 4087MB RAM
timezone: EST-10
services: pmcd pmproxy
pmcd: Version 5.0.0-1, 3 agents
pmda: pmcd mmv solaris
pmie: /var/log/pcp/pmie/munch/pmie.log
$ pcp -a /var/log/pcp/pmlogger/smash/20190909
Performance Co-Pilot configuration on smash:
archive: /var/log/pcp/pmlogger/smash/20190909
platform: Linux smash 2.6.32-279.46.1.el6.x86_64 #1 SMP Mon May 19 16:16:00 EDT 2014 x86_64
hardware: 8 cpus, 2 disks, 1 node, 23960MB RAM
timezone: EST-10
services: pmcd pmproxy
pmcd: Version 5.0.0-1, 8 agents
pmda: pmcd proc xfs linux mmv nvidia dmcache postgresql
pmlogger: primary logger: /var/log/pcp/pmlogger/smash/20190909.00.10
pmie: /var/log/pcp/pmie/smash/pmie.log
|
PCP comes with a wide range of command line utilities for accessing live performance metrics via PMCDs or historical data using archive logs. The following examples illustrate some of the most useful use cases, please see the corresponding manual pages for each command for additional information. In the examples below -h <host> could be used to query a remote host, the default is the local host. Shell completion support for Bash and especially for Zsh allows completing available metrics, metricsets (with pmrep), and available command line options.
$ pminfo -t |
$ pminfo -dfmtT disk.partitions.read |
$ pmval -t 2sec -f 3 disk.partitions.write |
$ pmdumptext -Xlimu -t 2sec 'kernel.all.load[1]' mem.util.used disk.partitions.write -h acme.com |
$ pmrep -p -b GB -t 2sec -o csv kernel.all.sysfork mem.util.free mem.util.used |
$ pcp atop |
$ pcp atopsar |
$ pmstat -t 2sec -h acme1.com -h acme2.com |
$ pmiostat -t 2sec |
$ pmchart -t 2sec -h acme1.com -h acme2.com |
PCP archive logs are located under /var/log/pcp/pmlogger/hostname, and the archive names indicate the time they cover. Archives are self-contained, and machine- and version-independent so they can be transfered to any machine for offline analysis.
$ pmdumplog -L acme.com/20140902 |
$ pcp -a acme.com/20140902 |
$ pminfo -a acme.com/20140902 |
$ pminfo -df mem.freemem -a acme.com/20140902 |
$ pmval -f 3 disk.partitions.write -a acme.com/20140902 |
$ pmval -d -t 2sec -f 3 disk.partitions.write -S @09:00 -T @10:00 -a acme.com/20140902 |