I'm trying to debug OutOfMemory errors that keep happening in our
elastic cluster. I've been monitoring statistics once a minute before
the error occurs. The problem seems to be from the field data cache.
I've seen other discussions about the need for the field data cache
with facet and sort searches but this cluster is only being used for
indexing right now. While monitoring the statistics I see the query
count go up every time the problem happens. I was wondering if
elasticsearch was querying itself in order to warm the field data
cache. Does it do this or does it wait until a real query comes in?
Today, I have a million of documents and 4 million document tag when I
do in every facet documents, I have a problem with OutOfMemory. I
started the JVM with 4GB.
I'm trying to debug OutOfMemory errors that keep happening in our
elastic cluster. I've been monitoring statistics once a minute before
the error occurs. The problem seems to be from the field data cache.
I've seen other discussions about the need for the field data cache
with facet and sort searches but this cluster is only being used for
indexing right now. While monitoring the statistics I see the query
count go up every time the problem happens. I was wondering if
elasticsearch was querying itself in order to warm the field data
cache. Does it do this or does it wait until a real query comes in?
I added "index.cache.field.type: soft" to our elastic config and this
has fixed the problem. Search performance isn't any worse. We don't
use facet search over the majority of our documents and we only sort
on two fields. My theory is that the resident cache is putting more
than the two sort fields in memory or all of the values in the two
sort fields are more than we have memory to accomodate. The fields
we're sorting on are an id field and a date field. The id field only
a dozen unique values across 100 million records. I want to try and
calculate how much memory the fields must be taking. I could probably
try to determine this by starting with an empty index and adding 10
documents then execute a query with sorting that will find all of
them. Then I could check the statistics to see how much memory is
allocated to the field cache.
Today, I have a million of documents and 4 million document tag when I
do in every facet documents, I have a problem with OutOfMemory. I
started the JVM with 4GB.
I'm trying to debug OutOfMemory errors that keep happening in our
elastic cluster. I've been monitoring statistics once a minute before
the error occurs. The problem seems to be from the field data cache.
I've seen other discussions about the need for the field data cache
with facet and sort searches but this cluster is only being used for
indexing right now. While monitoring the statistics I see the query
count go up every time the problem happens. I was wondering if
elasticsearch was querying itself in order to warm the field data
cache. Does it do this or does it wait until a real query comes in?
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.