KB045239 - Expanding a Ceph Cluster with Drives
Posted on October 20, 2020 by Rob MacQueen
Expanding a Ceph Cluster with Drives
To expand a Ceph Cluster with Hard Drives to allow additional usable capacity.
- A running ceph cluster running CentOS 7 and Ceph Nautilus
- Unformatted hard drives that are able to be added to the cluster
- Access to the cli of each osd node in the cluster
- Insert unformatted hard drives in each osd node as desired and take note of the slot ID's that you use.
- Order does not matter and placement does not matter
- Ensure that drives are fully inserted
- It is best practice to have an even amount of capacity in each server if possible
- Access each osd node
- Ensure disks are inserted and are showing up unpartitioned - you can use 45Drives Tools: /opt/ctools/lsdev or use lsblk
- Ensure the disks you are trying to add match the physical slots where you inserted the disks.
- Take note of the linux device name of the new disks you wish to add to the cluster for each node (ex. /dev/sdh /dev/sdi)
- Run a report for each linux device to ensure they're free of any partitions. This will also test for any errors:
# ceph-volume lvm batch --report /dev/sdh
- Run the command with all the necessary drives included to ensure it works properly:
# ceph-volume lvm batch --report /dev/sdh /dev/sdi /dev/sdj
- If no errors appear, run the command without the --report function:
# ceph-volume lvm batch /dev/sdh /dev/sdi /dev/sdj
Checklist for Completing Cluster Expansion:
- Run a Ceph -s command to display the status of the cluster:
# ceph -s
- If no issues appear and everything is running smoothly, run a ceph health detail and confirm all data has been redistributed and that backfills have been complete:
# ceph health detail
- In the Ceph Dashboard check the OSD usage to ensure data is evenly distributed between drives.
- If data isn't distributed look at the ceph balancer status, confirm that the mode is "ceph-compat" and active is "true"
# ceph balancer status
- In the Ceph Dashboard check the PG's on each OSD, they should all be between 100-150
- In the Ceph Dashboard check the Normal Distribution in the OSD overall performance tab
- Lastly, if the Dashboard is unavailable run a "ceph osd df" and check the PG's:
# ceph osd df
- Return OSD Backfill to the default value by navigating in the Dashboard to Cluster-->Configuration-->Search "backfill"-->OSD_max_backfill and set it to 1
- Use ceph -s to verify the storage capacity and number of OSDs before adding the new drives.
- Use ceph -s afterwards to verify that the OSD count and storage capacity has increased.
- Use LSBLK to verify that the drives have a ceph volume on them
- If there are already volumes present on the drives, be sure to use wipedev on them. Before running wipedev on drives, ensure you are targeting the correct drive and there is no critical data on the device.
# /opt/tools/wipdev -a