Table of Contents
Scope/Description
This article details the process of clearing the HEALTH_WARN on a ceph cluster due to a large omap object.
Prerequisites
Configured Ceph Cluster running CephFS
Steps
- In some cases, the Ceph health reporter will start reporting that ‘large objects’ are found within a pool. This error will display when running the following commands:
ceph -s
- Here is an example of that output below:
#ceph health detail HEALTH_WARN 1 large omap objects LARGE_OMAP_OBJECTS 1 large omap objects 1 large objects found in pool 'cephfs_metadata' Search the cluster log for 'Large omap object found' for more details.
- To resolve this issue, follow the steps below:
First, find the placement group the large files are living on. Use the following command to determine the PG group:
for i in `ceph pg ls-by-pool [POOL NAME] | tail -n +2 | head -n -2 | awk '{print $1}'`; do echo -n "$i: "; ceph pg $i query | grep num_large_omap_objects | head -1 | awk '{print $2}'; done | grep ": 1"
This command will provide you with the placement group number that is being affected by the large objects. For example: pg 2.26
- It’s necessary to determine which OSD the PG is living on. We can do this by using the following command:
# ceph pg map [PG #]
The output from this command should be something similar to below
# ceph pg map 2.26 osdmap e8768 pg 2.26 (2.26) -> up [29,94,37] acting [29,94,37]
- Now that we’ve identified which OSDs are hosting the large omap objects, we need to run a deep scrub on them.
ceph osd deep-scrub osd.29 ceph osd deep-scrub osd.37 ceph osd deep-scrub osd.94
Verification
Once the scrub completes on the OSDs, the cluster should be in a healthy state again.
Run ceph -s or ceph health detail to confirm this:
ceph -s cluster: id: 170b5370-2d51-4348-b6ef-79e627967474 health: HEALTH_OK services: mon: 3 daemons, quorum ceph-osd1,ceph-osd2,ceph-osd3 (age 8h) mgr: ceph-osd1(active, since 8h), standbys: ceph-osd2, ceph-osd3 mds: cephfs:1 {0=ceph-fsgw1=up:active} 1 up:standby-replay osd: 105 osds: 105 up (since 8h), 105 in (since 24h) data: pools: 3 pools, 4224 pgs objects: 53.10M objects, 26 TiB usage: 82 TiB used, 1.2 PiB / 1.3 PiB avail pgs: 4221 active+clean 2 active+clean+scrubbing+deep 1 active+clean+scrubbing
Troubleshooting
Views: 5066