Scope/Description
- This article describes the process of repairing inconsistent PGs/damaged objects on your Ceph Cluster.
Prerequisites
- Ceph Cluster
- Experiencing HEALTH_ERR state with damaged objects
- PGs that are inconsistent
Steps
Identifying damaged PGs
- We can see with ceph -s that we have some inconsistent PGs, and possible data damage.
cluster:
id: cec9ca98-b59f-4d91-8ddd-43802195c735
health: HEALTH_ERR
1 scrub errors
Possible data damage: 1 pg inconsistent
data:
pools: 10 pools, 1120 pgs
objects: 29.66 k objects, 99 GiB
usage: 320 GiB used, 7.7 TiB / 8.0 TiB avail
pgs: 1119 active+clean
1 active+clean+inconsistent
-
To find the damaged PG we can do ceph health detail.
ceph health detail
OSD_SCRUB_ERRORS 1 scrub errors
PG_DAMAGED Possible data damage: 1 pg inconsistent
pg 13.6 is active+clean+inconsistent, acting [4,18,14]
-
As seen in the output above , the PG with the damaged object is 13.6.
Repairing Inconsistent PGs
-
We can now repair the PG by doing ceph pg repair PG ID.
ceph pg repair 13.6
- Watch that the PG repair has begun in either the Ceph Dashboard or terminal with watch ceph -s.
data:
pools: 10 pools, 1120 pgs
objects: 29.66 k objects, 99 GiB
usage: 320 GiB used, 7.7 TiB / 8.0 TiB avail
pgs: 1119 active+clean
1 active+clean+scrubbing+deep+inconsistent+repair
- If successful the cluster should be updated to a healthy state.
Verification
- Once the PG has been repaired you can run the following two commands to check if the cluster is in a healthy state.
ceph -s
ceph health detail
Views: 1488