- This article describes the process of repairing inconsistent PGs/damaged objects on your Ceph Cluster.
- Ceph Cluster
- Experiencing HEALTH_ERR state with damaged objects
- PGs that are inconsistent
Identifying damaged PGs
- We can see with ceph -s that we have some inconsistent PGs, and possible data damage.
cluster: id: cec9ca98-b59f-4d91-8ddd-43802195c735 health: HEALTH_ERR 1 scrub errors Possible data damage: 1 pg inconsistent data: pools: 10 pools, 1120 pgs objects: 29.66 k objects, 99 GiB usage: 320 GiB used, 7.7 TiB / 8.0 TiB avail pgs: 1119 active+clean 1 active+clean+inconsistent
To find the damaged PG we can do ceph health detail.
ceph health detail OSD_SCRUB_ERRORS 1 scrub errors PG_DAMAGED Possible data damage: 1 pg inconsistent pg 13.6 is active+clean+inconsistent, acting [4,18,14]
As seen in the output above , the PG with the damaged object is 13.6.
Repairing Inconsistent PGs
We can now repair the PG by doing ceph pg repair PG ID.
ceph pg repair 13.6
- Watch that the PG repair has begun in either the Ceph Dashboard or terminal with watch ceph -s.
data: pools: 10 pools, 1120 pgs objects: 29.66 k objects, 99 GiB usage: 320 GiB used, 7.7 TiB / 8.0 TiB avail pgs: 1119 active+clean 1 active+clean+scrubbing+deep+inconsistent+repair
- If successful the cluster should be updated to a healthy state.
- Once the PG has been repaired you can run the following two commands to check if the cluster is in a healthy state.
ceph health detail
- If the process above failed to fix the HEALTH_ERR State you may have to manually fix the objects. See here for more details