Disk.indices is higher than disk.used in cat allocation

sandeepkanabar · July 28, 2021, 11:54am

When I run the following command:

GET _cat/allocation?v&s=disk.indices&h=shards,disk.indices,disk.used,disk.available,disk.total,disk.percent

it shows the following output:

shards disk.indices disk.used disk.total disk.percent
   160        1.4tb     1.4tb      1.7tb           86
   160        1.4tb     1.4tb      1.7tb           87
   160        1.5tb     1.5tb      1.7tb           89
   160        1.5tb     1.5tb      1.7tb           90
   480        7.7tb     3.7tb       20tb           18
   480        7.7tb     3.9tb       20tb           19

Can anyone help me understand how come the disk.indices in last two rows exceeds disk.used?

spinscale · July 29, 2021, 8:12am

Are you using searchable snapshots on those nodes? Just wondering if there is a difference, as they also hold more shards than the others.

sandeepkanabar · July 29, 2021, 9:26am

Hey Alexander,

No. Am not making use of searchable snapshots. ELK version is 7.10.2.

It's pretty puzzling as to how disk.indices can be greater than disk.used.

spinscale · July 29, 2021, 2:39pm

This indeed looks like a bug then. Can you open an issue in Issues · elastic/elasticsearch · GitHub please?

sandeepkanabar · July 29, 2021, 3:52pm

Sure. Will do. The documentation states that

This metric double-counts disk space for hard-linked files, such as those created when shrinking, splitting, or cloning an index.

I can the API GET _tasks?detailed=true and don't see any re-index or any other tasks. The only tasks that are listed seem to be of cluster:monitor/tasks/.

DavidTurner · July 29, 2021, 4:29pm

disk.indices is the total of the store sizes of each index (i.e. the sum of the sizes of the individual files), whereas disk.used is whatever the OS reports as the used space on the underlying filesystem. In particular that means that disk.indices will double-count any hard-linked files whereas disk.used won't.

sandeepkanabar · July 29, 2021, 4:37pm

Thanks @DavidTurner . Would you be able to shed more light on hard-linked files here? How would they get created? AFAIK, this is a managed cluster so unlikely that anyone could manually create hard-links.

Is there a possibility that say shrink was triggered and then aborted mid-way leading to hard-linked files being present.

DavidTurner · July 29, 2021, 4:44pm

Yeah things like shrink or split or clone are the usual answer.

spinscale · August 2, 2021, 7:27am

@DavidTurner do you still consider this a bug from user perspective and these confusing numbers being shown, despite the logic of the explanation of that behaviour?

DavidTurner · August 2, 2021, 7:53am

Not really, these numbers measure different things and are both important as they are defined today. disk.indices is a good measure of the size of the overall dataset and is insensitive to things like hard-linking, whereas disk.used tells us how much real disk space we're actually using right now, accounting for hard-linked files, filesystem overhead (e.g. rounding up small files to a whole block), non-file space usage (e.g. directory entries), and other data also stored on the same filesystem.

The docs do spell out the difference.

sandeepkanabar · August 2, 2021, 8:16am

Hi David,

Thanks for the clarification. Makes sense. But I've a few questions:

Is that why it double counts the hard-linked files?

I wouldn't be surprised if disk.used > disk.indices but in this case, the disk.indices > disk.used since last 6 consecutive days (and probably even more). If there were a re-index / cloning / shrinking task going on, it would be listed under GET _tasks?detailed=true. Correct?

But the output of GET _tasks?detailed=true just shows all tasks are of cluster:monitor/tasks/.

How do we find the root cause of disk.indices > disk.used in a managed ES Cluster where we cannot ssh ? Any thoughts?

DavidTurner · August 2, 2021, 10:34am

Yes.

Reindex yes, but that's not relevant. Cloning and shrinking are pretty quick, but will keep using hard-linked files for arbitrarily long, so you probably wouldn't see anything about them in the tasks.

Maybe this is the fundamental point: is there actually a problem here? Do you need to investigate?

sandeepkanabar · August 3, 2021, 10:00am

Thanks David. I thought it was a problem until you clarified in a previous post. I was looking into the huge disk space occupied by this cluster and while disk.used showed 30TB, disk.indices reported 37.2TB. So I was puzzled which one to rely on since 7.2TB difference is a huge number.

So I suppose I can rest assure that this cluster uses 30TB and not 37.2TB. Correct?

DavidTurner · August 3, 2021, 11:02am

Correct, disk.used is the actual current disk usage.

system · August 31, 2021, 11:03am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Sometimes disk.Indices greater than disk.used Elasticsearch	2	585	March 8, 2018
Cat allocation disk.indices higher than disk.used or disk.total Elasticsearch	1	336	February 27, 2021
Difference between disk.indices vs disk.used in cat allocation Elasticsearch	2	3867	September 29, 2017
Difference in disk.indices and disk.used Elasticsearch	1	1361	August 25, 2018
Storage footprint not the same as cummulative index size, why is that, Elasticsearch	14	745	January 8, 2019

Disk.indices is higher than disk.used in cat allocation

Related topics