Limiting the Field Cache with Filters on Documents

Mike · February 11, 2013, 8:24pm

I understand that when you facet or sort on a field, it loads all of the
possible values of that field into the field cache. This can get huge, and
is usually the cause of a lot of heap OOM errors, especially on fields with
high cardinality. My question is, are the values of the field that are put
into the field cache limited by the number of records returned by the
query, or does ElasticSearch load all unique values of a field into the
field cache regardless of the filters? If so, which filters limit them?
Are there only certain kinds of queries that limit these as well? I read
another post that said only the constant_score query limits the records
used in the cache.

Example:
{
"query": {
"filtered": {
"query": { ...QUERY A... },
"filter": { ... FILTER A...}
}
},
"filter": { ...FILTER B... },
"facets": {
"hugeFacet": {
"terms": {
"field": "fieldWithGBofUniqueValues"
},
"facet_filter": { ...FILTER C... }
}
}
}

I know that QUERY A, FILTER A, FILTER B, and FILTER C will limit the
records that my hugeFacet will use when returning results. However:

Will all 4 of those also limit what is added into the field cache?
If not, which ones will and won't? I believe I read somewhere that
facet_filters (FILTER C) limit what is put into the field cache, but it
never said anything about these other filters.
Can any query be used in QUERY A to decrease the number of field
values that are looked at and added into the field cache, or is "constant_score"
the only one that will? If that's the case then does that actually mean
nothing in my entire "filtered" query will limit what's put into the field
cache (QUERY A and FILTER A)?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Mike · February 13, 2013, 3:50pm

Fount the
post: https://groups.google.com/d/topic/elasticsearch/ArOHQIKiMKE/discussion

"So if you need to account for filters when you run facets, you need to
either wrap them in the constant_score query or use facet_filter."

Maybe I misunderstood. Any clarification on my 2 questions above would be
greatly appreciated.

On Monday, February 11, 2013 3:24:46 PM UTC-5, Mike wrote:

I understand that when you facet or sort on a field, it loads all of the
possible values of that field into the field cache. This can get huge, and
is usually the cause of a lot of heap OOM errors, especially on fields with
high cardinality. My question is, are the values of the field that are put
into the field cache limited by the number of records returned by the
query, or does Elasticsearch load all unique values of a field into the
field cache regardless of the filters? If so, which filters limit them?
Are there only certain kinds of queries that limit these as well? I read
another post that said only the constant_score query limits the records
used in the cache.

Example:
{
"query": {
"filtered": {
"query": { ...QUERY A... },
"filter": { ... FILTER A...}
}
},
"filter": { ...FILTER B... },
"facets": {
"hugeFacet": {
"terms": {
"field": "fieldWithGBofUniqueValues"
},
"facet_filter": { ...FILTER C... }
}
}
}

I know that QUERY A, FILTER A, FILTER B, and FILTER C will limit the
records that my hugeFacet will use when returning results. However:

Will all 4 of those also limit what is added into the field cache?
If not, which ones will and won't? I believe I read somewhere that
facet_filters (FILTER C) limit what is put into the field cache, but it
never said anything about these other filters.

Can any query be used in QUERY A to decrease the number of field
values that are looked at and added into the field cache, or is "constant_score"
the only one that will? If that's the case then does that actually mean
nothing in my entire "filtered" query will limit what's put into the field
cache (QUERY A and FILTER A)?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

radu_gheorghe · February 17, 2013, 12:21pm

Hello Mike,

For sorting, Elasticsearch sorts all the documents matching your query and
filters, then gives you the top X items back. Any document that's filtered
out by either your query or your filter won't bother caches.

For faceting, filters don't matter. Facets are done on query results. If
you need to filter the results on which facets are done, you need to use
facet_filter. In this case, caches are used by all the documents on which
faceting is done (query results, minus what facet_filter takes out).

Best regards,
Radu

http://sematext.com/ -- Elasticsearch -- Solr -- Lucene

On Wed, Feb 13, 2013 at 5:50 PM, Mike mnilsson2323@gmail.com wrote:

Fount the post:
https://groups.google.com/d/topic/elasticsearch/ArOHQIKiMKE/discussion

"So if you need to account for filters when you run facets, you need to
either wrap them in the constant_score query or use facet_filter."

Maybe I misunderstood. Any clarification on my 2 questions above would be
greatly appreciated.

On Monday, February 11, 2013 3:24:46 PM UTC-5, Mike wrote:

I understand that when you facet or sort on a field, it loads all of the
possible values of that field into the field cache. This can get huge, and
is usually the cause of a lot of heap OOM errors, especially on fields with
high cardinality. My question is, are the values of the field that are put
into the field cache limited by the number of records returned by the
query, or does Elasticsearch load all unique values of a field into the
field cache regardless of the filters? If so, which filters limit them?
Are there only certain kinds of queries that limit these as well? I read
another post that said only the constant_score query limits the records
used in the cache.

Example:
{
"query": {
"filtered": {
"query": { ...QUERY A... },
"filter": { ... FILTER A...}
}
},
"filter": { ...FILTER B... },
"facets": {
"hugeFacet": {
"terms": {
"field": "fieldWithGBofUniqueValues"
},
"facet_filter": { ...FILTER C... }
}
}
}

I know that QUERY A, FILTER A, FILTER B, and FILTER C will limit the
records that my hugeFacet will use when returning results. However:

Will all 4 of those also limit what is added into the field cache?
If not, which ones will and won't? I believe I read somewhere that
facet_filters (FILTER C) limit what is put into the field cache, but it
never said anything about these other filters.

Can any query be used in QUERY A to decrease the number of field
values that are looked at and added into the field cache, or is "constant_score"
the only one that will? If that's the case then does that actually mean
nothing in my entire "filtered" query will limit what's put into the field
cache (QUERY A and FILTER A)?

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Topic		Replies	Views
Yet another facet/memory question Elasticsearch	2	348	July 6, 2017
Faceting on a field with very many unique values, on a very large index Elasticsearch	5	485	July 6, 2017
Size of Field Cache built/loaded into memory with respect to query executed Elasticsearch	6	364	July 6, 2017
Estimating field cache size for facets in advance Elasticsearch	11	474	July 6, 2017
Field cache efficiency Elasticsearch	4	362	July 6, 2017

Limiting the Field Cache with Filters on Documents

Best regards, Radu

Related topics

Best regards,
Radu