Elasticsearch behavior when stop indexing the biggest part of your messages

tvdruenen · December 9, 2019, 11:56am

Hi Elastic fanatics,

Last couple of years I was able to find everything I wanted to know in either the official documentation or in this forum. For the first time I stumbled upon something I'm not sure about but also can't really find somewhere. I really hope someone can help me out here. Lets go:

Use case
We want some really large fields (99% of a 2000kb message) not being indexed. How does this affect the storage and memory requirements of Elasticsearch? We still keep all the data in _source, but only limit the way this data can be retrieved with some keywords instead of indexing everything.

With our current setup and all incoming data indexed we know exactly how many data and shards a node can handle and the cluster remains happy. We're not really sure what will happen after this change.

Assumption 1
When we don't index 99% of the characters in our messages but we still store the _source, our shards can handle more messages and become larger before they become unstable: we can store more data per node.

Assumption 2
The inverted inverted index in RAM affects the happiness of a node and is based on the inverted index on disc. If the inverted inverted index (RAM) gets too big a node becomes unstable.

Question 1
Are my assumption correct?

If so...

Question 2
How do I find/calculate the inverted index size? Is that a combination of the .doc and .pos files on disc?

Question 3
How do I find/calculate the inverted inverted index size Elastic search stores in memory? I would like to compare this with an index with indexed data and an index where most of the data isn't indexed.

If not..

Question 4
If assumption 1 is not right, so you can't store more messages/data in a shard if you don't index 99% of the characters, can we then get away with less RAM?

If my assumptions are totally wrong and my questions don't make any sense, please also advice in what would be the right direction then.

system · January 6, 2020, 12:10pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Fitting Inverted list in memory Elasticsearch	5	332	July 6, 2017
Search over large documents Elasticsearch	5	343	March 11, 2019
RAM and Data size Elasticsearch	6	2185	August 23, 2020
Understanding implications of `index: false` with `type: keyword` Elasticsearch	4	2314	March 16, 2021
ElasticSearch index size peculiarity Elasticsearch	2	661	July 6, 2017

Elasticsearch behavior when stop indexing the biggest part of your messages

Related topics