Limiting the Field Cache with Filters on Documents

I understand that when you facet or sort on a field, it loads all of the
possible values of that field into the field cache. This can get huge, and
is usually the cause of a lot of heap OOM errors, especially on fields with
high cardinality. My question is, are the values of the field that are put
into the field cache limited by the number of records returned by the
query, or does ElasticSearch load all unique values of a field into the
field cache regardless of the filters? If so, which filters limit them?
Are there only certain kinds of queries that limit these as well? I read
another post that said only the constant_score query limits the records
used in the cache.

Example:
{
"query": {
"filtered": {
"query": { ...QUERY A... },
"filter": { ... FILTER A...}
}
},
"filter": { ...FILTER B... },
"facets": {
"hugeFacet": {
"terms": {
"field": "fieldWithGBofUniqueValues"
},
"facet_filter": { ...FILTER C... }
}
}
}

I know that QUERY A, FILTER A, FILTER B, and FILTER C will limit the
records that my hugeFacet will use when returning results. However:

  1. Will all 4 of those also limit what is added into the field cache?
    If not, which ones will and won't? I believe I read somewhere that
    facet_filters (FILTER C) limit what is put into the field cache, but it
    never said anything about these other filters.
  2. Can any query be used in QUERY A to decrease the number of field
    values that are looked at and added into the field cache, or is "constant_score"
    the only one that will? If that's the case then does that actually mean
    nothing in my entire "filtered" query will limit what's put into the field
    cache (QUERY A and FILTER A)?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Fount the
post: https://groups.google.com/d/topic/elasticsearch/ArOHQIKiMKE/discussion

"So if you need to account for filters when you run facets, you need to
either wrap them in the constant_score query or use facet_filter."

Maybe I misunderstood. Any clarification on my 2 questions above would be
greatly appreciated.

On Monday, February 11, 2013 3:24:46 PM UTC-5, Mike wrote:

I understand that when you facet or sort on a field, it loads all of the
possible values of that field into the field cache. This can get huge, and
is usually the cause of a lot of heap OOM errors, especially on fields with
high cardinality. My question is, are the values of the field that are put
into the field cache limited by the number of records returned by the
query, or does Elasticsearch load all unique values of a field into the
field cache regardless of the filters? If so, which filters limit them?
Are there only certain kinds of queries that limit these as well? I read
another post that said only the constant_score query limits the records
used in the cache.

Example:
{
"query": {
"filtered": {
"query": { ...QUERY A... },
"filter": { ... FILTER A...}
}
},
"filter": { ...FILTER B... },
"facets": {
"hugeFacet": {
"terms": {
"field": "fieldWithGBofUniqueValues"
},
"facet_filter": { ...FILTER C... }
}
}
}

I know that QUERY A, FILTER A, FILTER B, and FILTER C will limit the
records that my hugeFacet will use when returning results. However:

  1. Will all 4 of those also limit what is added into the field cache?
    If not, which ones will and won't? I believe I read somewhere that
    facet_filters (FILTER C) limit what is put into the field cache, but it
    never said anything about these other filters.
  2. Can any query be used in QUERY A to decrease the number of field
    values that are looked at and added into the field cache, or is "constant_score"
    the only one that will? If that's the case then does that actually mean
    nothing in my entire "filtered" query will limit what's put into the field
    cache (QUERY A and FILTER A)?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Hello Mike,

For sorting, Elasticsearch sorts all the documents matching your query and
filters, then gives you the top X items back. Any document that's filtered
out by either your query or your filter won't bother caches.

For faceting, filters don't matter. Facets are done on query results. If
you need to filter the results on which facets are done, you need to use
facet_filter. In this case, caches are used by all the documents on which
faceting is done (query results, minus what facet_filter takes out).

Best regards,
Radu

http://sematext.com/ -- Elasticsearch -- Solr -- Lucene

On Wed, Feb 13, 2013 at 5:50 PM, Mike mnilsson2323@gmail.com wrote:

Fount the post:
https://groups.google.com/d/topic/elasticsearch/ArOHQIKiMKE/discussion

"So if you need to account for filters when you run facets, you need to
either wrap them in the constant_score query or use facet_filter."

Maybe I misunderstood. Any clarification on my 2 questions above would be
greatly appreciated.

On Monday, February 11, 2013 3:24:46 PM UTC-5, Mike wrote:

I understand that when you facet or sort on a field, it loads all of the
possible values of that field into the field cache. This can get huge, and
is usually the cause of a lot of heap OOM errors, especially on fields with
high cardinality. My question is, are the values of the field that are put
into the field cache limited by the number of records returned by the
query, or does Elasticsearch load all unique values of a field into the
field cache regardless of the filters? If so, which filters limit them?
Are there only certain kinds of queries that limit these as well? I read
another post that said only the constant_score query limits the records
used in the cache.

Example:
{
"query": {
"filtered": {
"query": { ...QUERY A... },
"filter": { ... FILTER A...}
}
},
"filter": { ...FILTER B... },
"facets": {
"hugeFacet": {
"terms": {
"field": "fieldWithGBofUniqueValues"
},
"facet_filter": { ...FILTER C... }
}
}
}

I know that QUERY A, FILTER A, FILTER B, and FILTER C will limit the
records that my hugeFacet will use when returning results. However:

  1. Will all 4 of those also limit what is added into the field cache?
    If not, which ones will and won't? I believe I read somewhere that
    facet_filters (FILTER C) limit what is put into the field cache, but it
    never said anything about these other filters.
  2. Can any query be used in QUERY A to decrease the number of field
    values that are looked at and added into the field cache, or is "constant_score"
    the only one that will? If that's the case then does that actually mean
    nothing in my entire "filtered" query will limit what's put into the field
    cache (QUERY A and FILTER A)?

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.