This is not clarified anywhere. This description of memory usage by field
cache should help everyone.
- Estimations (in Bytes) for a single Lucene segment, field cache for the
following types:
=> numbers (including datetime formats)
48 (JAVA structures for docs list) + 4 * max_doc_id * max_array_size
8 (JAVA structures for unique term list) + unique_terms_count * 4
=> strings
48 (JAVA structures for docs list) + 4 * max_doc_id * max_array_size
+
8 (JAVA structures for unique term list) + unique_terms_count * (4 +
string_size_in_bytes)
max_doc_id - the highest lucene id + 1 (in coresponding segment)
string_size_in_bytes(max) = 4 * string_len (UTF8)
max_array_size - maximum number of elements (through all the documents in
segment) in multivalued field.
- Since the field cache is per segment, unique terms array is kept per
segment too
Check that you use multivalued vield as tags. So even if you have only 1
document with eg. 10 elements in tags and the rest of the documents have 1
element in tags (I still mean in a single Lucene segment), the field cache
still uses bidirectional array for document list with Y-size = 10, so it
takes the same amount of memory as if all the documents have 10 values in
tags.
So one thins is unique terms - this can be estimated very simple. But the
second thing is an array with document pointers - this can be very heavy. I
strongly do NOT recommend using facets on multivalued fields, in this case
use nested array - then each element of the field is a separated document
and here the situation does not occur.
In my case optimizing multivalued fields and switching to nested gave me
about 2GB of field cache usage instead of 17GB
Remember that this cache can be estimated in a single segment. Each shard
consists of 10-20 segments (for default ES settings). Each segment max size
(by default) is 5GB and merge policy takes care to keep a few big segments
(up to 5GB), most segments are small (it depends of shard size of course).
You can check segments sizes getting localhost:9200//_segments.
I hope that this will solve your problems with field cache exploding It
solved mine
Best regards.
Marcin Dojwa
2013/4/24 jieren jieren@klout.com
Thanks for the fast answer!
On Wednesday, April 24, 2013 11:32:38 AM UTC-7, David Pilato wrote:
Unique ones.
So facetting on few unique values will scale really easily.
But if you facet on a comment field for example, it will load (too) many
terms in memory.
HTH
--
David
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs
Le 24 avr. 2013 à 20:20, jieren jie...@klout.com a écrit :
Hi everyone
I am still a bit unclear on how terms facets load values into memory.
What people have said is that it loads all the values into memory. Does
that means it loads all the unique values of the fields into memory or the
values of the fields per document?
For example
Suppose I have documents:
{
"id" : "1",
"tags" : ["foo", "bar"]
}
{
"id" : "2",
"tags" : ["foo", "bar"]
}
Will "foo" and "bar" be loaded once or twice into memory?
Thank you!
Jieren
--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@**googlegroups.com.
For more options, visit https://groups.google.com/**groups/opt_outhttps://groups.google.com/groups/opt_out
.
--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.