I understand that when you facet or sort on a field, it loads all of the
possible values of that field into the field cache. This can get huge, and
is usually the cause of a lot of heap OOM errors, especially on fields with
high cardinality. My question is, are the values of the field that are put
into the field cache limited by the number of records returned by the
query, or does ElasticSearch load all unique values of a field into the
field cache regardless of the filters? If so, which filters limit them?
Are there only certain kinds of queries that limit these as well? I read
another post that said only the constant_score query limits the records
used in the cache.
I know that QUERY A, FILTER A, FILTER B, and FILTER C will limit the
records that my hugeFacet will use when returning results. However:
Will all 4 of those also limit what is added into the field cache?
If not, which ones will and won't? I believe I read somewhere that
facet_filters (FILTER C) limit what is put into the field cache, but it
never said anything about these other filters.
Can any query be used in QUERY A to decrease the number of field
values that are looked at and added into the field cache, or is "constant_score"
the only one that will? If that's the case then does that actually mean
nothing in my entire "filtered" query will limit what's put into the field
cache (QUERY A and FILTER A)?
"So if you need to account for filters when you run facets, you need to
either wrap them in the constant_score query or use facet_filter."
Maybe I misunderstood. Any clarification on my 2 questions above would be
greatly appreciated.
On Monday, February 11, 2013 3:24:46 PM UTC-5, Mike wrote:
I understand that when you facet or sort on a field, it loads all of the
possible values of that field into the field cache. This can get huge, and
is usually the cause of a lot of heap OOM errors, especially on fields with
high cardinality. My question is, are the values of the field that are put
into the field cache limited by the number of records returned by the
query, or does Elasticsearch load all unique values of a field into the
field cache regardless of the filters? If so, which filters limit them?
Are there only certain kinds of queries that limit these as well? I read
another post that said only the constant_score query limits the records
used in the cache.
I know that QUERY A, FILTER A, FILTER B, and FILTER C will limit the
records that my hugeFacet will use when returning results. However:
Will all 4 of those also limit what is added into the field cache?
If not, which ones will and won't? I believe I read somewhere that
facet_filters (FILTER C) limit what is put into the field cache, but it
never said anything about these other filters.
Can any query be used in QUERY A to decrease the number of field
values that are looked at and added into the field cache, or is "constant_score"
the only one that will? If that's the case then does that actually mean
nothing in my entire "filtered" query will limit what's put into the field
cache (QUERY A and FILTER A)?
For sorting, Elasticsearch sorts all the documents matching your query and
filters, then gives you the top X items back. Any document that's filtered
out by either your query or your filter won't bother caches.
For faceting, filters don't matter. Facets are done on query results. If
you need to filter the results on which facets are done, you need to use
facet_filter. In this case, caches are used by all the documents on which
faceting is done (query results, minus what facet_filter takes out).
"So if you need to account for filters when you run facets, you need to
either wrap them in the constant_score query or use facet_filter."
Maybe I misunderstood. Any clarification on my 2 questions above would be
greatly appreciated.
On Monday, February 11, 2013 3:24:46 PM UTC-5, Mike wrote:
I understand that when you facet or sort on a field, it loads all of the
possible values of that field into the field cache. This can get huge, and
is usually the cause of a lot of heap OOM errors, especially on fields with
high cardinality. My question is, are the values of the field that are put
into the field cache limited by the number of records returned by the
query, or does Elasticsearch load all unique values of a field into the
field cache regardless of the filters? If so, which filters limit them?
Are there only certain kinds of queries that limit these as well? I read
another post that said only the constant_score query limits the records
used in the cache.
I know that QUERY A, FILTER A, FILTER B, and FILTER C will limit the
records that my hugeFacet will use when returning results. However:
Will all 4 of those also limit what is added into the field cache?
If not, which ones will and won't? I believe I read somewhere that
facet_filters (FILTER C) limit what is put into the field cache, but it
never said anything about these other filters.
Can any query be used in QUERY A to decrease the number of field
values that are looked at and added into the field cache, or is "constant_score"
the only one that will? If that's the case then does that actually mean
nothing in my entire "filtered" query will limit what's put into the field
cache (QUERY A and FILTER A)?
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.