Scope/Description

This article details the process of clearing the HEALTH_WARN on a ceph cluster due to a large omap object.

Prerequisites

Configured Ceph Cluster running CephFS

Steps

In some cases, the Ceph health reporter will start reporting that ‘large objects’ are found within a pool. This error will display when running the following commands:

ceph -s

Here is an example of that output below:

#ceph health detail HEALTH_WARN 1 large omap objects
LARGE_OMAP_OBJECTS 1 large omap objects
1 large objects found in pool 'cephfs_metadata'
Search the cluster log for 'Large omap object found' for more details.

To resolve this issue, follow the steps below:
First, find the placement group the large files are living on. Use the following command to determine the PG group:

for i in `ceph pg ls-by-pool [POOL NAME] | tail -n +2 | head -n -2 | awk '{print $1}'`; do echo -n "$i: "; ceph pg $i query | grep num_large_omap_objects | head -1 | awk '{print $2}'; done | grep ": 1"

This command will provide you with the placement group number that is being affected by the large objects. For example: pg 2.26

It’s necessary to determine which OSD the PG is living on. We can do this by using the following command:

# ceph pg map [PG #]

The output from this command should be something similar to below

# ceph pg map 2.26 
osdmap e8768 pg 2.26 (2.26) -> up [29,94,37] acting [29,94,37]

Now that we’ve identified which OSDs are hosting the large omap objects, we need to run a deep scrub on them.

ceph osd deep-scrub osd.29
ceph osd deep-scrub osd.37
ceph osd deep-scrub osd.94

Verification

Once the scrub completes on the OSDs, the cluster should be in a healthy state again.
Run ceph -s or ceph health detail to confirm this:

  ceph -s
  cluster:
    id:     170b5370-2d51-4348-b6ef-79e627967474
    health: HEALTH_OK

  services:
    mon: 3 daemons, quorum ceph-osd1,ceph-osd2,ceph-osd3 (age 8h)
    mgr: ceph-osd1(active, since 8h), standbys: ceph-osd2, ceph-osd3
    mds: cephfs:1 {0=ceph-fsgw1=up:active} 1 up:standby-replay
    osd: 105 osds: 105 up (since 8h), 105 in (since 24h)

  data:
    pools:   3 pools, 4224 pgs
    objects: 53.10M objects, 26 TiB
    usage:   82 TiB used, 1.2 PiB / 1.3 PiB avail
    pgs:     4221 active+clean
             2    active+clean+scrubbing+deep
             1    active+clean+scrubbing

Troubleshooting

Was this article helpful?

Like 13 Dislike 8