Cat api indices understanding

Hi,

I used the cat api to get information about our indices : GET http://localhost:9200/_cat/indices?v.

We change our mapping and this impacts the indices store sizes and i try to figure the size increases.

I am not sure to understand the output precisely.

Provided size are size of data or size on disk ?

Primary store size is the mean size of one primary or the size of all primaries ?

I guess that primary store size is the size of all primaries and the store size should be equal to : (1+nb of replica)*(primary store size) is it right ?

But in fact this is not the case, store size is almost equal to (1+nb of replica)*(primary store size) but not exactly.

Last point to evaluate the mean size of one document, is it possible to compute : (primary store size)/(number of doc+number of deleted docs) ?

I understood that deleted docs are still in the indices but no more accessible.

Thanks for any helps.

B. Granier

Hi @granier,

You should see two columns: store.size and pri.store.size. store.size is the total size (primaries and replicas), whereas pri.store.size is the size of primaries alone. This is storage (disk) size.

Notice that you can get help on the column headers using:

GET _cat/indices?help

Notice that replicas and primary are not necessarily identically sized. It is expected that the formula you have do not give the exact number. There are multiple reasons for this, since each replica/primary in many ways works independently, maintaining their own lucene index for the shard. For instance, indexing does not necessarily happen in the same order and merging may kick in at different times.

I am not sure what exactly you want out of the avg size calculation. You can do the avg as you describe it and it will then give you the avg size over current docs and deletions. Notice that deletions occur for two reasons: updates and deletes. If you want to use this for forecasting storage use, it might make sense to use the more conservative (size/number of docs) instead in order to not shoot under the target, but it does depend on whether you expect many updates/deletes or not.

Hi Henning,

Thanks for your explanations that are clear and precise.

My first goal is to understand how indices sizes grow after mapping changes.

But due to our process, the computation is not so easy :

  • the number of indexed documents changed.
  • to be able to change the mapping, we delete all indices and index all documents again, so the number of deleted documents changed.

To have an estimation of growth I try to compute the size of one document ...

Hi @granier,

the way Lucene stores docs is complicated and I think trying to compare between one data set with the old mappings and deletions in it vs another data set with the new mappings and no deletions will be hard and likely error prone.

I think a better approach is to pick a relevant subset of data, index it into an index using the old mapping and index the same subset of data into another index using the new mapping. Then compare sizes of those two indices. The data set has to have a "good" size, i.e. not be unrealistically small and also not too large so that the exercise takes too long. You could let your target index have only one primary shard and target 50GB for the experiment. You can use reindex to copy the data.

Hi,

Thanks for this last comment. To use reindex is not so easy due to disk size and storage management.

But we can investigate this way to improve our processes, good idea.

Thanks for support.

B. Granier

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.