KB450105 – Ceph Real Time Monitoring

Last modified: July 19, 2021
You are here:

Ceph can send performance metrics to a Prometheus endpoint.

Prometheus stores in a time series database – Grafana queries and displays Prometheus data.

For more information on how Ceph reports its performance data, click here.

This method will install and configure prometheus, node_exporter and grafana.

  1. Add entries to inventory host file for node exporter, prometheus and grafana
    1. [dashboards]
  2. Run “ansible-playbook prometheus.yml”
    1. After completion open port 9090 in the firewall
    2. Verify you can reach prometheus UI via http://NODEIP:9090
  3. Run “ansible-playbook grafana.yml”
    1. Default login is “admin:admin” this can be changed by editing the “grafana_security” varible in the “grafana.yml” playbook.
    2. Update “org_name” in “grafana.yml” to Main.Org to allow grafana to be embedded into the Ceph Dashboard.
    3. After completion open port 3000 in firewall
    4. Verify you can reach grafana via http://NODEIP:3000
  4. SSH into node hosting Prometheus and Grafana. Set up provisioning to import ceph dashboard into grafana
    1. Download Ceph Dashboards:
      1. yum install -y http://download.ceph.com/rpm-nautilus/el7/noarch/ceph-grafana-dashboards-14.2.1-0.el7.noarch.rpm
    2. Create provisioning yml file to import dashboards
      1. cd /etc/grafana/provisioning/dashboards
      2. curl -LO http://images.45drives.com/ceph/monitoring/grafana/provisioning/ceph-dashboard.yml
    3. Give grafana permission: chown root:grafana -R /etc/grafana ; chmod 775 /etc/grafana
    4. Install extra grafana-plugins
      1. grafana-cli plugins install vonage-status-panel
      2. grafana-cli plugins install grafana-piechart-panel
    5. Restart grafana-server
      1. systemctl restart grafana-server
  5. View dashboards. http://metric-server:3000


Install to /opt

wget https://github.com/prometheus/prometheus/releases/download/v2.2.1/prometheus-2.2.1.linux-amd64.tar.gz
tar -zxvf prometheus-2.2.1.linux-amd64.tar.gz -C /opt/
ln -s prometheus-2.2.1.linux-amd64/ prometheus

Create systemd script

vim /usr/lib/systemd/system/prometheus.service
Description=Prometheus Server
ExecStart=/opt/prometheus/prometheus --config.file=/opt/prometheus/prometheus.yml                          

Start Service, and verify you can reach Prometheus webUI. http://HOST:9090

systemctl daemon-reload
systemctl start prometheus

Install Grafana & Dependancies using a rpm. Get the most up to date rpm here.

yum install initscripts fontconfig https://s3-us-west-2.amazonaws.com/grafana-releases/release/grafana-5.0.1-1.x86_64.rpm
systemctl daemon-reload
systemctl enable grafana-server

Verify you can reach grafana webUI. http://HOST:3000

Download and run the node_epxorter install script on each node you want to monitor

wget http://images.45drives.com/ceph/install-node_exporter.sh
sh install-node_exporter.sh

Verify node_exporter is working by pointing your browser to http://HOST:9100/metrics

Enable module from a node in the ceph cluster

ceph mgr module enable prometheus
ceph mgr module ls

Verify you can reach the metrics page. http://HOST:9283/metrics
Note: Although all mgr nodes can host this, only the active mgr at the time is reachable.

Put the following in the “prometheus.yml” file

  scrape_interval:     15s
  evaluation_interval: 15s
 - job_name: 'node'
     - files:
       - node_targets.yml
 - job_name: 'ceph'
   honor_labels: true
     - files:
       - ceph_targets.yml

Create node_targets.yml.

Sample file is here http://images.45drives.com/ceph/monitoring/files/node_targets.yml

Create ceph_targets.yml

Sample file is here http://images.45drives.com/ceph/monitoring/files/ceph_targets.yml

Restart Prometheus service

systemctl restart prometheus

Verify that prometheus can query ceph/node statistics

By default, Prometheus is not tuned for viewing metrics over short time periods.

Decrease the scrape_interval value in the “prometheus.yml” file to get better resolution (the “ms” suffix is valid for specifying milliseconds). For the change to take effect, you must restart Prometheus.

systemctl restart prometheus


The rate() and irate() functions are good tools for viewing a metric’s instantaneous rate of change. The irate() function is better than the rate() function for graphing volatile metrics.


As a rule of thumb, the time parameter (above in square brackets) should be a multiple of the scrape_interval. This helps in dealing with missed scrapes.

For more information about Prometheus functions, click here.

Was this article helpful?
Dislike 0
Views: 1022
Unboxing Racking Storage Drives Cable Setup Power UPS Sizing Remote Access