Primary store size showing wrong info while using _cat/indices

I somehow doubt _cat/indices is showing me wrong information regarding store size.

I have two indices - giitran_with_message and giitran_without_message.

Now both have have almost the same document. Each one has 1000 docs. The only difference is that giitran_with_message has one extra field - message.

image

Now as you can see from response of _cat/indices, the primary store size of the index (giitran_with_message) with extra field is smaller than the index without message field even though store size seems correct.

Normally primary store size should be half of store size since we have one replica. It looks logical for giitran_with_message ( ss= 505.1kb pss=252.5kb) but not for giitran_wthout_message (ss =426.8kb pss = 281.2kb). Any thoughts ?

In short when indices are very small (which yours are) the store sizes can look / be a little inconsistent (segments are getting created etc) . Index a 100k or 1M messages in and I suspect you will see what you expect to see. In fact as the index grows it actual size Will Saw Tooth a bit as it grows segments get created, merged etc.

Oh other thought if your test document has the same message the storage per doc (when you divide out) will probably be less than if it is a random message, how much depends on the actual data.

Yeah the test document is actually same. So all 1000 docs have same data apart from _id.

Will try with 100k docs and compare.

I have also noticed when they are very small, do a _search against the index before looking at the sizes, I am not sure why but it seems to make sure the latest data is reflected. I don't notice the same behavior once the indices are larger.

Yes and even 100K docs is still very small in Elasticsearch terms, that will be 25-28MB we we often work with indices that are 20-50GB+ (even much larger). Elasticsearch will work great with smaller data set where it gets amazing is with larger data sets.

Curious Are you trying to calculate storage for a larger data set? If so I would try to get to ~10% of the prod datasets before you try to calculate storage.

When comparing size, always make sure you first forcemerge down to a single segment as the indices may have merged differently.

1 Like

Yes i have also observed that the _cat api is little slow and doing a search against the index refreshes it.

I am actually evaluating if I remove the message field from my index how much space will I save. Currently our index size per day is 60gbs. So I was thinking of removing the message field since we have the individual fields which makes up the message.

I will take a look at forcemerge.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.