Zabbix Template (and Simple Script) for RAPL Power Measurement

Intel’s RAPL technology allows you to measure the power usage of your CPU, including some sub-components such as the memory controller. It even works on some AMD CPUs. Measuring it requires some kind of scripting or automation, since the underlying measurements report the total microjoules consumed by the component since boot, rather than an instantaneous wattage. I have both a simple script with minimal dependencies, and a Zabbix template for RAPL measurements.

Simple Script

#!/bin/bash

# File containing the cumulative energy usage in microjoules
ENERGY_FILE="/sys/class/powercap/$1/energy_uj"

# Function to read energy from the file
read_energy() {
  cat "$ENERGY_FILE"
}

# Check if the energy file exists
if [[ ! -f "$ENERGY_FILE" ]]; then
  echo "Error: Energy file $ENERGY_FILE does not exist."
  exit 1
fi

NAME=$(cat /sys/class/powercap/$1/name)

# Initial energy value
previous_energy=$(read_energy)

# Start continuous monitoring
echo "Monitoring '$NAME' power usage. Press Ctrl+C to stop."
while true; do
  # Wait for 1 second
  sleep 1

  # Read the current energy value
  current_energy=$(read_energy)

  # Calculate energy difference in microjoules
  energy_diff=$((current_energy - previous_energy))

  # Handle possible overflow (assuming 32-bit or 64-bit values)
  if [[ $energy_diff -lt 0 ]]; then
    max_value=$((2**64)) # Adjust for 32-bit if needed
    energy_diff=$((energy_diff + max_value))
  fi

  # Convert energy difference to joules
  energy_diff_joules=$(echo "$energy_diff / 1000000" | bc -l)

  # Power in watts = energy (joules) / time (seconds)
  power_watts=$(echo "$energy_diff_joules / 1" | bc -l)

  # Print the power usage
  printf "Power usage: %.6f W\n" "$power_watts"

  # Update previous energy value
  previous_energy=$current_energy
done

This script takes a RAPL name (typically intel-rapl:0 for the CPU and intel-rapl:0:0 for the IMC, intel-rapl:1 for a second CPU, and so on) as the only parameter.

# ./cpupower.sh intel-rapl:0
Monitoring 'package-0' power usage. Press Ctrl+C to stop.
Power usage: 48.899594 W
Power usage: 48.769285 W
Power usage: 49.942499 W
Power usage: 49.125667 W
Power usage: 51.212699 W
Power usage: 50.415276 W
Power usage: 51.224784 W
^C

Zabbix Template

This template will auto-discover any RAPL items on the host and take measurements every 60 seconds. Fortunately, due to the RAPL values indicating cumulative power draw, you don’t “miss” short spikes entirely – they are averaged out over 60 seconds. It is also configured to reject values below 0W or above 2kW to filter out invalid data, such as when the RAPL counter resets after a reboot.

zabbix_export:
  version: '7.0'
  template_groups:
    - uuid: 449d04f6170e4bdea0501d60281f2da0
      name: 'Linux servers'
  templates:
    - uuid: ff6e0c061e2f48ee8c438b5ddb1a29c2
      template: 'Template RAPL Power Measurement'
      name: 'Template RAPL Power Measurement'
      groups:
        - name: 'Linux servers'
      discovery_rules:
        - uuid: 93f4c7f7e219412a838e26bad7543a18
          name: 'RAPL Domains Discovery'
          key: 'vfs.dir.get["/sys/class/powercap/","^intel-rapl:.+$",,,,0]'
          delay: 30s
          item_prototypes:
            - uuid: 216f8576482a4e2ab872dffd06b3918d
              name: 'RAPL {#RAPL_DIR}'
              key: 'vfs.file.contents[/sys/class/powercap/{#RAPL_DIR}/energy_uj]'
              value_type: FLOAT
              trends: '0'
              preprocessing:
                - type: CHANGE_PER_SECOND
                  parameters:
                    - ''
                - type: MULTIPLIER
                  parameters:
                    - '1.0E-6'
                - type: IN_RANGE
                  parameters:
                    - '0'
                    - '2000'
          lld_macro_paths:
            - lld_macro: '{#RAPL_DIR}'
              path: $.basename

From my testing, it seems to require at least version 6.x of the Zabbix agent – it did not work on a 5.x host. Note that the energy_uj file is most likely not readable by your Zabbix user. I used this Ansible platbook to correct the permissions on my Debian systems (replace physical with your host group, and zabbix with your Zabbix agent user account name):

- name: Configure RAPL access for Zabbix
  hosts: physical
  become: yes
  tasks:
    - name: Ensure 'rapl' group exists
      group:
        name: rapl
        state: present
        system: true

    - name: Add 'zabbix' user to 'rapl' group
      user:
        name: zabbix
        groups: rapl
        append: yes

    - name: Create udev rule to run chgrp on RAPL devices
      copy:
        dest: /etc/udev/rules.d/99-rapl.rules
        content: 'SUBSYSTEM=="powercap", KERNEL=="intel-rapl:*", ACTION=="add", RUN+="/bin/chgrp rapl /sys/class/powercap/%k/energy_uj", RUN+="/bin/chmod 440 /sys/class/powercap/%k/energy_uj"'
        owner: root
        group: root
        mode: '0644'

    - name: Reload udev rules
      command: udevadm control --reload

    - name: Trigger udev rules
      command: udevadm trigger --subsystem-match=powercap --action=add

    - name: Restart Zabbix agent
      service:
        name: zabbix-agent
        state: restarted

Leave a Reply