Ceph can send performance metrics to a Prometheus endpoint.
Prometheus stores in a time series database – Grafana queries and displays Prometheus data.
For more information on how Ceph reports its performance data, click here.
Install Monitoring Stack Via Ansible (recommended)
This method will install and configure prometheus, node_exporter and grafana.
- Add entries to inventory host file for node exporter, prometheus and grafana
-
[dashboards] metric-server [nodes:children] osds mons rgws clients
-
- Run “ansible-playbook prometheus.yml”
- After completion open port 9090 in the firewall
- Verify you can reach prometheus UI via http://NODEIP:9090
- Run “ansible-playbook grafana.yml”
- Default login is “admin:admin” this can be changed by editing the “grafana_security” varible in the “grafana.yml” playbook.
- Update “org_name” in “grafana.yml” to Main.Org to allow grafana to be embedded into the Ceph Dashboard.
- After completion open port 3000 in firewall
- Verify you can reach grafana via http://NODEIP:3000
- SSH into node hosting Prometheus and Grafana. Set up provisioning to import ceph dashboard into grafana
- Download Ceph Dashboards:
-
yum install -y http://download.ceph.com/rpm-nautilus/el7/noarch/ceph-grafana-dashboards-14.2.1-0.el7.noarch.rpm
-
- Create provisioning yml file to import dashboards
-
cd /etc/grafana/provisioning/dashboards
-
curl -LO http://images.45drives.com/ceph/monitoring/grafana/provisioning/ceph-dashboard.yml
-
- Give grafana permission: chown root:grafana -R /etc/grafana ; chmod 775 /etc/grafana
- Install extra grafana-plugins
-
grafana-cli plugins install vonage-status-panel
-
grafana-cli plugins install grafana-piechart-panel
-
- Restart grafana-server
-
systemctl restart grafana-server
-
- Download Ceph Dashboards:
-
View dashboards. http://metric-server:3000
Install Prometheus Server (Manual)
Install to /opt
wget https://github.com/prometheus/prometheus/releases/download/v2.2.1/prometheus-2.2.1.linux-amd64.tar.gz tar -zxvf prometheus-2.2.1.linux-amd64.tar.gz -C /opt/ ln -s prometheus-2.2.1.linux-amd64/ prometheus
Create systemd script
vim /usr/lib/systemd/system/prometheus.service
[Unit] Description=Prometheus Server Documentation=https://prometheus.io/docs/introduction/overview/ After=network-online.target [Service] User=root Restart=on-failure ExecStart=/opt/prometheus/prometheus --config.file=/opt/prometheus/prometheus.yml [Install] WantedBy=multi-user.target
Start Service, and verify you can reach Prometheus webUI. http://HOST:9090
systemctl daemon-reload systemctl start prometheus
Install Grafana Server (Manual)
Install Grafana & Dependancies using a rpm. Get the most up to date rpm here.
yum install initscripts fontconfig https://s3-us-west-2.amazonaws.com/grafana-releases/release/grafana-5.0.1-1.x86_64.rpm systemctl daemon-reload systemctl enable grafana-server
Verify you can reach grafana webUI. http://HOST:3000
Install node_exporter on Each Node (Manual)
Download and run the node_epxorter install script on each node you want to monitor
wget http://images.45drives.com/ceph/install-node_exporter.sh sh install-node_exporter.sh
Verify node_exporter is working by pointing your browser to http://HOST:9100/metrics
Enable Prometheus Plugin on Ceph Cluster (Manual)
Enable module from a node in the ceph cluster
ceph mgr module enable prometheus ceph mgr module ls
Verify you can reach the metrics page. http://HOST:9283/metrics
Note: Although all mgr nodes can host this, only the active mgr at the time is reachable.
Configure Prometheus Scrape Config (Manual)
Put the following in the “prometheus.yml” file
global: scrape_interval: 15s evaluation_interval: 15s scrape_configs: - job_name: 'node' file_sd_configs: - files: - node_targets.yml - job_name: 'ceph' honor_labels: true file_sd_configs: - files: - ceph_targets.yml
Create node_targets.yml.
Sample file is here http://images.45drives.com/ceph/monitoring/files/node_targets.yml
Create ceph_targets.yml
Sample file is here http://images.45drives.com/ceph/monitoring/files/ceph_targets.yml
Restart Prometheus service
systemctl restart prometheus
Verify that prometheus can query ceph/node statistics
Prometheus Data over Short Time Window
By default, Prometheus is not tuned for viewing metrics over short time periods.
Decrease the scrape_interval value in the “prometheus.yml” file to get better resolution (the “ms” suffix is valid for specifying milliseconds). For the change to take effect, you must restart Prometheus.
systemctl restart prometheus
Note:
The rate() and irate() functions are good tools for viewing a metric’s instantaneous rate of change. The irate() function is better than the rate() function for graphing volatile metrics.
irate(http_requests_total{job="node"}[10s])
As a rule of thumb, the time parameter (above in square brackets) should be a multiple of the scrape_interval. This helps in dealing with missed scrapes.
For more information about Prometheus functions, click here.