Hi David, I'm not sure that's helped, sadly.
I've tried making the query as restrictive as possible, by putting the same
filters in both the main query and the facet:
{
"query": {
"filtered" : {
"query" : {
"match_all" : { }
},
"filter" : {
"and" : {
"filters": [
{ "range" : {
"pubDate" : {
"from" : "2010-12-31",
"to" : "2011-01-01"
}
} },
{
"exists" : { "field" : "foo" }
},
{
"term" : { "bar" : "XXX" }
},
{
"prefix" : { "baz" : "a" }
}
]
}
}
}
},
"facets" : {
"published" : {
"date_histogram" : {
"field" : "pubDate",
"interval" : "month"
},
"facet_filter" : {
"and" : {
"filters": [
{ "range" : {
"pubDate" : {
"from" : "2010-12-31",
"to" : "2011-01-01"
}
} },
{
"exists" : { "field" : "foo" }
},
{
"term" : { "bar" : "XXX" }
},
{
"prefix" : { "baz" : "a" }
}
]
}
}
}
}
}
but even this with a single day covered bombs out in a JVM with 1G heap.
And there are only 63388 documents in that day so there's no reason it
should. (I know this because a count query without a facet on that date
range is instant...)
On Friday, 15 June 2012 12:24:07 UTC+1, David Pilato wrote:
Hi Andrew,
You have to filter the facet with the same filters you are already using
in your query.
So put your range filter as a Facet Filter should help.
Facet Filter
All facets can be configured with an additional filter (explained in the
Query DSL http://www.elasticsearch.org/guide/reference/query-dsl section),
which will reduce the documents they use for computing results. An
example with a term filter:
{
"facets" : {
"" : {
"" : {
...
},
"facet_filter" : {
"term" : { "user" : "kimchy"}
}
}
}
}
Note that this is different from a facet of the filterhttp://www.elasticsearch.org/guide/reference/api/search/facets/filter-facet.html
type.
See also if scope could help :
Elasticsearch Platform — Find real-time answers at scale | Elastic
Scope
As we have already mentioned, facet computation is restricted to the scope
of the current query, called main, by default. Facets can be computed
within the global scope as well, in which case it will return values
computed acrosss all documents in the index:
{
"facets" : {
"" : {
"" : { ... },
"global" : true
}
}
}
There’s one important distinction to keep in mind. While search *queries
- restrict both the returned documents and facet counts, search filters restrict
only returned documents — but notfacet counts.
If you need to restrict both the documents and facets, and you’re not
willing or able to use a query, you may use a facet filter.
HTH
David.
Le 15 juin 2012 à 13:06, Andrew Clegg andrew.clegg@gmail.com a écrit :
Hi,
I'm trying to run a date facet over a subset of a long time series (very
many values), and it keeps OOMing. But when I remove the facet clause from
the query, I get an overall result instantly.
This suggests to me that even with the filter in place, ES is trying to
load all the distinct values of the field. Is that correct? If so, is there
any way round it?
The query looks like this:
{
"query": {
"filtered" : {
"query" : {
"range" : {
"pubDate" : {
"from" : "2010-10-01",
"to" : "2011-01-01"
}
}
},
"filter" : {
"and" : {
"filters": [
{
"exists" : { "field" : "foo" }
},
{
"term" : { "bar" : "somestring" }
},
{
"prefix" : { "baz" : "a" }
}
]
}
}
}
},
"facets" : {
"published" : {
"date_histogram" : {
"field" : "pubDate",
"interval" : "month"
}
}
}
}
Like that, I get:
[2012-06-15 11:54:48,335][WARN ][index.cache.field.data.soft] [Centurion]
[search_criteria] loading field [event.pubDate] caused out of memory
failure
But when I take out the facet, no problems.
This is with search_type=count by the way, as I don't care about the
actual hits.
Thanks,
Andrew.
--
David Pilato
http://dev.david.pilato.fr/
Twitter : @dadoonet