How does the memory usage for terms facets work?

jieren · April 24, 2013, 6:20pm

Hi everyone

I am still a bit unclear on how terms facets load values into memory.

What people have said is that it loads all the values into memory. Does
that means it loads all the unique values of the fields into memory or the
values of the fields per document?

For example

Suppose I have documents:
{
"id" : "1",
"tags" : ["foo", "bar"]
}

{
"id" : "2",
"tags" : ["foo", "bar"]
}

Will "foo" and "bar" be loaded once or twice into memory?

Thank you!
Jieren

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

dadoonet · April 24, 2013, 6:32pm

Unique ones.

So facetting on few unique values will scale really easily.
But if you facet on a comment field for example, it will load (too) many terms in memory.

HTH

--
David
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Le 24 avr. 2013 à 20:20, jieren jieren@klout.com a écrit :

Hi everyone

I am still a bit unclear on how terms facets load values into memory.

What people have said is that it loads all the values into memory. Does that means it loads all the unique values of the fields into memory or the values of the fields per document?

For example

Suppose I have documents:
{
"id" : "1",
"tags" : ["foo", "bar"]
}

{
"id" : "2",
"tags" : ["foo", "bar"]
}

Will "foo" and "bar" be loaded once or twice into memory?

Thank you!
Jieren

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

jieren · April 24, 2013, 8:52pm

Thanks for the fast answer!

On Wednesday, April 24, 2013 11:32:38 AM UTC-7, David Pilato wrote:

Unique ones.

So facetting on few unique values will scale really easily.
But if you facet on a comment field for example, it will load (too) many
terms in memory.

HTH

--
David
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Le 24 avr. 2013 à 20:20, jieren <jie...@klout.com <javascript:>> a écrit :

Hi everyone

I am still a bit unclear on how terms facets load values into memory.

What people have said is that it loads all the values into memory. Does
that means it loads all the unique values of the fields into memory or the
values of the fields per document?

For example

Suppose I have documents:
{
"id" : "1",
"tags" : ["foo", "bar"]
}

{
"id" : "2",
"tags" : ["foo", "bar"]
}

Will "foo" and "bar" be loaded once or twice into memory?

Thank you!
Jieren

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Marcin_Dojwa · April 25, 2013, 1:23pm

This is not clarified anywhere. This description of memory usage by field
cache should help everyone.

Estimations (in Bytes) for a single Lucene segment, field cache for the
following types:
=> numbers (including datetime formats)
48 (JAVA structures for docs list) + 4 * max_doc_id * max_array_size

8 (JAVA structures for unique term list) + unique_terms_count * 4

=> strings
48 (JAVA structures for docs list) + 4 * max_doc_id * max_array_size
+
8 (JAVA structures for unique term list) + unique_terms_count * (4 +
string_size_in_bytes)

max_doc_id - the highest lucene id + 1 (in coresponding segment)
string_size_in_bytes(max) = 4 * string_len (UTF8)
max_array_size - maximum number of elements (through all the documents in
segment) in multivalued field.

Since the field cache is per segment, unique terms array is kept per
segment too

Check that you use multivalued vield as tags. So even if you have only 1
document with eg. 10 elements in tags and the rest of the documents have 1
element in tags (I still mean in a single Lucene segment), the field cache
still uses bidirectional array for document list with Y-size = 10, so it
takes the same amount of memory as if all the documents have 10 values in
tags.

So one thins is unique terms - this can be estimated very simple. But the
second thing is an array with document pointers - this can be very heavy. I
strongly do NOT recommend using facets on multivalued fields, in this case
use nested array - then each element of the field is a separated document
and here the situation does not occur.

In my case optimizing multivalued fields and switching to nested gave me
about 2GB of field cache usage instead of 17GB

Remember that this cache can be estimated in a single segment. Each shard
consists of 10-20 segments (for default ES settings). Each segment max size
(by default) is 5GB and merge policy takes care to keep a few big segments
(up to 5GB), most segments are small (it depends of shard size of course).
You can check segments sizes getting localhost:9200//_segments.

I hope that this will solve your problems with field cache exploding It
solved mine

Best regards.
Marcin Dojwa

2013/4/24 jieren jieren@klout.com

Thanks for the fast answer!

On Wednesday, April 24, 2013 11:32:38 AM UTC-7, David Pilato wrote:

Unique ones.

So facetting on few unique values will scale really easily.
But if you facet on a comment field for example, it will load (too) many
terms in memory.

HTH

--
David
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Le 24 avr. 2013 à 20:20, jieren jie...@klout.com a écrit :

Hi everyone

I am still a bit unclear on how terms facets load values into memory.

What people have said is that it loads all the values into memory. Does
that means it loads all the unique values of the fields into memory or the
values of the fields per document?

For example

Suppose I have documents:
{
"id" : "1",
"tags" : ["foo", "bar"]
}

{
"id" : "2",
"tags" : ["foo", "bar"]
}

Will "foo" and "bar" be loaded once or twice into memory?

Thank you!
Jieren

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@**googlegroups.com.

For more options, visit https://groups.google.com/**groups/opt_out https://groups.google.com/groups/opt_out
.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Yuan_Xu · April 27, 2013, 2:31am

Hi,

Thanks a lot for your detailed explaination...

I got two questions:

Those calculating term facet formulas based on which ES version?
multivalued term means what? a field contains multivalues or the multi
field type offered by ES mapping?

Thanks in advance

On Thursday, April 25, 2013 2:20:13 AM UTC+8, jieren wrote:

Hi everyone

I am still a bit unclear on how terms facets load values into memory.

What people have said is that it loads all the values into memory. Does
that means it loads all the unique values of the fields into memory or the
values of the fields per document?

For example

Suppose I have documents:
{
"id" : "1",
"tags" : ["foo", "bar"]
}

{
"id" : "2",
"tags" : ["foo", "bar"]
}

Will "foo" and "bar" be loaded once or twice into memory?

Thank you!
Jieren

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Ivan · April 28, 2013, 2:31am

I believe Marcin is referring to the field cache structure found in
versions pre-0.90. The latest version does improve on the use case with
high-cardinality fields, but how and by how much I do not know (still on
0.20).
The former (field contains multivalues). Multi-field essentially creates
two or more fields under the hood (which in turn can also be multi-valued),
so the original field is not sharing the same Lucene field.

--
Ivan

On Fri, Apr 26, 2013 at 7:31 PM, Yuan Xu lemon8292@gmail.com wrote:

Hi,

Thanks a lot for your detailed explaination...

I got two questions:

Those calculating term facet formulas based on which ES version?

multivalued term means what? a field contains multivalues or the multi
field type offered by ES mapping?

Thanks in advance

On Thursday, April 25, 2013 2:20:13 AM UTC+8, jieren wrote:

Hi everyone

I am still a bit unclear on how terms facets load values into memory.

What people have said is that it loads all the values into memory. Does
that means it loads all the unique values of the fields into memory or the
values of the fields per document?

For example

Suppose I have documents:
{
"id" : "1",
"tags" : ["foo", "bar"]
}

{
"id" : "2",
"tags" : ["foo", "bar"]
}

Will "foo" and "bar" be loaded once or twice into memory?

Thank you!
Jieren

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Marcin_Dojwa · May 1, 2013, 11:29am

Hi,

Yes, these calculations refer to 0.20.6
Like Ivan said, eg. arrays.

Best regards.
Marcin Dojwa

2013/4/28 Ivan Brusic ivan@brusic.com

I believe Marcin is referring to the field cache structure found in
versions pre-0.90. The latest version does improve on the use case with
high-cardinality fields, but how and by how much I do not know (still on
0.20).

The former (field contains multivalues). Multi-field essentially
creates two or more fields under the hood (which in turn can also be
multi-valued), so the original field is not sharing the same Lucene field.

--
Ivan

On Fri, Apr 26, 2013 at 7:31 PM, Yuan Xu lemon8292@gmail.com wrote:

Hi,

Thanks a lot for your detailed explaination...

I got two questions:

Those calculating term facet formulas based on which ES version?

multivalued term means what? a field contains multivalues or the
multi field type offered by ES mapping?

Thanks in advance

On Thursday, April 25, 2013 2:20:13 AM UTC+8, jieren wrote:

Hi everyone

I am still a bit unclear on how terms facets load values into memory.

What people have said is that it loads all the values into memory. Does
that means it loads all the unique values of the fields into memory or the
values of the fields per document?

For example

Suppose I have documents:
{
"id" : "1",
"tags" : ["foo", "bar"]
}

{
"id" : "2",
"tags" : ["foo", "bar"]
}

Will "foo" and "bar" be loaded once or twice into memory?

Thank you!
Jieren

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Topic		Replies	Views
Consolidate facet search knowledge about memory usage Elasticsearch	6	372	July 6, 2017
Term facet memory consideration in the documentation Elasticsearch	3	366	July 6, 2017
Terms facet explodes memory Elasticsearch	16	360	July 6, 2017
Hints for reducing facet memory? Elasticsearch	2	330	July 6, 2017
Yet another facet/memory question Elasticsearch	2	348	July 6, 2017

How does the memory usage for terms facets work?

Related topics