About Doc Values feature and out of memory during term aggregation

linlma · July 22, 2015, 5:48am

Hello Elastic Experts,

I am reading article about out of memory during aggregation, for example, below articles, I do not quite understand why there is out of memory issue if we do filter + term aggregation? My feeling is after filter, the sub data set is small and term aggregation on that data set should not use too much memory. Looks like the out of memory is caused by Elastic Search is loading unnecessary data into memory during term aggregation -- if so what are the unnecessary data and how doc value helps here? Thanks.

thanks in advance,
Lin

warkolm · July 23, 2015, 7:23am

Even with a filter you can still try to pull too much data into memory.

Are you having issues?

linlma · July 23, 2015, 7:55am

Hi Mark,

Suppose I index all students in an index, I filter just by a single student ID (student ID are unique). Do you mean in this case, even if single student ID is simple filter, Elastic Search needs to load much more data into memory? If so, in this case, what are the much more data?

regards,
Lin

warkolm · July 23, 2015, 8:02am

No.

I mean that if you have millions or billions of records, even after filtering those down you can still have a lot of data, which can overwhelm the heap.

linlma · July 23, 2015, 11:31pm

Thanks Mark,

But how does DocValue helps here? I mean if after filter, we have millions or billions of record? My use case is, after filter, there are not too much records, but I am doing term aggregation on the results, and term vocabulary size may be around 100k. Your advice is highly appreciated.

regards,
Lin

warkolm · July 23, 2015, 11:37pm

Doc values is fielddata but on disk, so it removes it from heap which gives you more space to do things like this.

linlma · July 23, 2015, 11:51pm

Thanks Mark,

If we want to do term aggregation, we need to put term (as a fielddata in an index) on disk as DocValue? Or we need to put other fielddata on disk as DocValue?

regards,
Lin

warkolm · July 23, 2015, 11:57pm

Take a look at https://www.elastic.co/guide/en/elasticsearch/guide/current/doc-values.html and https://www.elastic.co/guide/en/elasticsearch/guide/current/doc-values.html, it might explain it better.

linlma · July 24, 2015, 3:55am

Hi Mark,

Actually my question comes from the links.

For term aggregation scenario, what are the differences between making term data field as DocValue or not?

BTW, you share the same documents twice? Or you want to share two different ones?

regards,
Lin

warkolm · July 24, 2015, 3:56am

Doc values lives on disk.
Field data exists in heap.

The former is more efficient.

linlma · July 24, 2015, 3:57am

Thanks Mark,

But for term aggregation scenario, you mean make the term field data on disk?

regards,
Lin

warkolm · July 24, 2015, 4:11am

Yes that is correct.

linlma · July 24, 2015, 4:15am

Thanks Mark,

In the document you referred, it is mentioned, "While in-memory fielddata has to be built on the fly at search time by uninverting the inverted index", what means uninverting the inverted index? Could you show an example of in the context of term aggregation, what does it mean?

regards,
Lin

Topic		Replies	Views
Cause of doc_values_memory_in_bytes and how to reduce? Elasticsearch	2	997	July 5, 2017
Why doc_values_memory_in_bytes is small? Elasticsearch	1	458	October 2, 2018
Doc Values vs Field Data Questions Elasticsearch	6	1821	July 6, 2017
Term aggregation and doc values Elasticsearch	5	881	July 5, 2017
Fielddata cache and doc values Elasticsearch	2	390	July 6, 2017

About Doc Values feature and out of memory during term aggregation

Related topics