pcpicon

    

Home  · Charts  · Time Control

PCP Quick Reference Guide

Introduction

Performance Co-Pilot (PCP) is an open source framework and toolkit for monitoring, analyzing, and responding to details of live and historical system performance. PCP has a fully distributed, plug-in based architecture making it particularly well suited to centralized analysis of complex environments and systems. Custom performance metrics can be added using the C, C++, Perl, and Python interfaces.

This page provides quick instructions how to install and use PCP on a set of hosts of which one (a monitor host) will be used for monitoring and analyzing itself and other hosts (collector hosts).

Installation

PCP is available on all recent Linux distribution releases, including Debian/Fedora/RHEL/SUSE/Ubuntu. For other operating systems and distributions you might want to consider installation from sources.

Installing Collector Hosts


   To install basic PCP tools and services and enable collecting performance data on systemd based distributions, run:

# yum install pcp # or apt-get or dnf or zypper
# systemctl enable pmcd
# systemctl start pmcd
# systemctl enable pmlogger
# systemctl start pmlogger

Here we enable the Performance Metrics Collector Daemon (pmcd(1)) on the host which then in turn will control and request metrics on behalf of clients from various Performance Metrics Domain Agents (PMDAs). The PMDAs provide the actual data from different components (domains) in the system, for example from the Linux Kernel PMDA or the NFS Client PMDA. The default configuration includes over 1000 metrics with negligible overall overhead when queried. If no queries for metrics are sent to the agent, it doesn't do anything at all. Local PCP archive logs will also be enabled on the host for convenience with pmlogger(1).

   To enable PMDAs which are not enabled by default, for example the PostgreSQL database PMDA, run the corresponding Install script:

# cd /var/lib/pcp/pmdas/postgresql
# ./Install

The client tools will contact local or remote PMCDs as needed, communication with PMCD over the network uses TCP port 44321 by default.

Installing Monitor Host

The following additional packages can be optionally installed on the monitoring host to extend the set of monitoring tools from the base pcp package.

   Install various system monitoring tools, graphical analysis tools, and documentation:

# yum install pcp-doc pcp-gui pcp-system-tools # or apt-get or dnf or zypper

To enable centralized archive log collection on the monitoring host, its pmlogger is configured to fetch performance metrics from collector hosts. Add each collector host to the pmlogger configuration file /etc/pcp/pmlogger/control and then restart the pmlogger service on the monitoring host.

   Enable recording of metrics from remote host acme.com:

# echo acme.com n n PCP_LOG_DIR/pmlogger/acme.com -r -T24h10m -c config.acme.com >> /etc/pcp/pmlogger/control

# systemctl restart pmlogger

The health of the remote log collector will be done every half an hour. You can also run /usr/libexec/pcp/bin/pmlogger_check -V -C (on Fedora/RHEL) or /usr/lib/pcp/bin/pmlogger_check -V -C (on Debian/Ubuntu) manually to do a health check.

Note that a default configuration file (config.acme.com above) will be generated if it does not exist already. This process is optional (a custom configuration for each host can be provided instead), see the pmlogconf(1) manual page for details on this.

Dynamic Host Discovery

In dynamic environments manually configuring every host is not feasible, perhaps even impossible. The discovery service (pmfind(1) can be used to auto-discover and auto-configure new collector hosts and containers for logging and/or rule inference.

   To install pmfind to begin monitoring discovered metric sources, run:

# systemctl enable pmfind
# systemctl restart pmfind

    Discover use of the PCP pmcd service on the local network:

$ pmfind -s pmcd

Installation Health Check

Basic health check for running services, network connectivity between hosts, and enabled PMDAs can be done simply as follows.

   Check PCP services on remote host munch and historically, from a local archive for host smash:

$ pcp -h munch
Performance Co-Pilot configuration on munch:
  platform: SunOS munch 5.11 oi_151a8 i86pc
  hardware: 4 cpus, 3 disks, 4087MB RAM
  timezone: EST-10
  services: pmcd pmproxy
      pmcd: Version 5.0.0-1, 3 agents
      pmda: pmcd mmv solaris
      pmie: /var/log/pcp/pmie/munch/pmie.log

$ pcp -a /var/log/pcp/pmlogger/smash/20190909
Performance Co-Pilot configuration on smash:
   archive: /var/log/pcp/pmlogger/smash/20190909
  platform: Linux smash 2.6.32-279.46.1.el6.x86_64 #1 SMP Mon May 19 16:16:00 EDT 2014 x86_64
  hardware: 8 cpus, 2 disks, 1 node, 23960MB RAM
  timezone: EST-10
  services: pmcd pmproxy
      pmcd: Version 5.0.0-1, 8 agents
      pmda: pmcd proc xfs linux mmv nvidia dmcache postgresql
  pmlogger: primary logger: /var/log/pcp/pmlogger/smash/20190909.00.10
      pmie: /var/log/pcp/pmie/smash/pmie.log

System Level Performance Monitoring

PCP comes with a wide range of command line utilities for accessing live performance metrics via PMCDs or historical data using archive logs. The following examples illustrate some of the most useful use cases, please see the corresponding manual pages for each command for additional information. In the examples below -h <host> could be used to query a remote host, the default is the local host. Shell completion support for Bash and especially for Zsh allows completing available metrics, metricsets (with pmrep), and available command line options.

Monitoring Live Performance Metrics


    Display all the enabled performance metrics on a host with a short description:

$ pminfo -t

    Display detailed information about a performance metric and its current values:

$ pminfo -dfmtT disk.partitions.read

    Monitor live disk write operations per partition with two second interval using fixed point notation (use -i instance to list only certain metrics and -r for raw values):

$ pmval -t 2sec -f 3 disk.partitions.write

    Monitor live CPU load, memory usage, and disk write operations per partition with two second interval using fixed width columns on the remote host acme:

$ pmdumptext -Xlimu -t 2sec 'kernel.all.load[1]' mem.util.used disk.partitions.write -h acme.com

    Monitor live process creation rate and free/used memory with two second interval printing timestamps and using GBs for output values in CSV format:

$ pmrep -p -b GB -t 2sec -o csv kernel.all.sysfork mem.util.free mem.util.used

    Monitor system metrics in a top-like window:

$ pcp atop

    Monitor system metrics in a sar-like (System Activity Report) manner:

$ pcp atopsar

    Monitor system metrics in a sar like fashion with two second interval from two different hosts:

$ pmstat -t 2sec -h acme1.com -h acme2.com

    Monitor system metrics in an iostat like fashion with two second interval:

$ pmiostat -t 2sec

    Monitor performance metrics with a GUI application with two second default interval from two different hosts. Use File->New Chart to select metrics to be included in a new view and use File->Open View to use a predefined view:

$ pmchart -t 2sec -h acme1.com -h acme2.com

Retrospective Performance Analysis

PCP archive logs are located under /var/log/pcp/pmlogger/hostname, and the archive names indicate the time they cover. Archives are self-contained, and machine- and version-independent so they can be transfered to any machine for offline analysis.

    Check the host, timezone and the time period an archive covers:

$ pmdumplog -L acme.com/20140902

    Check PCP configuration at the time when an archive was created:

$ pcp -a acme.com/20140902

    Display all enabled performance metrics at the time when an archive was created:

$ pminfo -a acme.com/20140902

    Display detailed information about a performance metric at the time when an archive was created:

$ pminfo -df mem.freemem -a acme.com/20140902

    Dump past disk write operations per partition in an archive using fixed point notation (use -i instance to list only certain metrics and -r for raw values):

$ pmval -f 3 disk.partitions.write -a acme.com/20140902

    Replay past disk write operations per partition in an archive with two second interval using fixed point notation between 9 AM and 10 AM (use full dates with syntax like @"2014-08-20 14:00:00"):

$ pmval -d -t 2sec -f 3 disk.partitions.write -S @09:00 -T @10:00 -a acme.com/20140902