Massive perf difference with filter versus filtered query

I'm seeing some major performance difference depending on if I wrap my filter in a query. I don't understand, because the docs say to use filters for exact matching.

This query takes about 800ms, even after repeated executions (so caches are hot):
{ "filter": { "term": { "ProjectId": 4191152 } },
"from": 0, "size": 50,
"sort": [], "facets": {}
}

But slapping query filtered around it makes it take 5ms on repeated executions:

{ "query": { "filtered": {
"filter": { "term": { "ProjectId": 4191152 } } } },
"from": 0, "size": 50,
"sort": [], "facets": {}
}

What am I misunderstanding? I've got 80M documents, 30 of which match this query, so the only thing I can guess is that somehow when I don't use a "query" element at the root, Elasticsearch retrieves every document and applies my filter, versus using some indexed approach when using query.

-Michael

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/BLUPR07MB674B6F4B405F739E034FB1AD4330%40BLUPR07MB674.namprd07.prod.outlook.com.
For more options, visit https://groups.google.com/d/optout.

Because the first one is a post_filter (BTW we renamed it). So it is applied after the search on the resultset.
The second is applied first and then the query is run.

I guess this is the difference here.

I would use the second one everytime unless you need to compute aggregations on the full dataset instead of on the filtered resultset.

My 2 cents

David

Le 28 janv. 2015 à 05:44, Michael Giagnocavo mgg@giagnocavo.net a écrit :

I'm seeing some major performance difference depending on if I wrap my filter in a query. I don't understand, because the docs say to use filters for exact matching.

This query takes about 800ms, even after repeated executions (so caches are hot):
{ "filter": { "term": { "ProjectId": 4191152 } },
"from": 0, "size": 50,
"sort": [], "facets": {}
}

But slapping query filtered around it makes it take 5ms on repeated executions:

{ "query": { "filtered": {
"filter": { "term": { "ProjectId": 4191152 } } } },
"from": 0, "size": 50,
"sort": [], "facets": {}
}

What am I misunderstanding? I've got 80M documents, 30 of which match this query, so the only thing I can guess is that somehow when I don't use a "query" element at the root, Elasticsearch retrieves every document and applies my filter, versus using some indexed approach when using query.

-Michael

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/BLUPR07MB674B6F4B405F739E034FB1AD4330%40BLUPR07MB674.namprd07.prod.outlook.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/2D815FC1-2E69-4156-B3DE-1D2C15F2DB2C%40pilato.fr.
For more options, visit https://groups.google.com/d/optout.

We had a similar issue, because we needed the filters to apply only on the
result of the query we've made so we used filtered_querys and the
performance boosted about 10X. On the other hand, in my experience fields
like ProjectId should be in the query section, not as a filter (of course
it depends in the number of projects available).
Have you tried that?

On Wednesday, January 28, 2015 at 1:44:57 AM UTC-3, Michael Giagnocavo
wrote:

I'm seeing some major performance difference depending on if I wrap my
filter in a query. I don't understand, because the docs say to use filters
for exact matching.

This query takes about 800ms, even after repeated executions (so caches
are hot):
{ "filter": { "term": { "ProjectId": 4191152 } },
"from": 0, "size": 50,
"sort": [], "facets": {}
}

But slapping query filtered around it makes it take 5ms on repeated
executions:

{ "query": { "filtered": {
"filter": { "term": { "ProjectId": 4191152 } } } },
"from": 0, "size": 50,
"sort": [], "facets": {}
}

What am I misunderstanding? I've got 80M documents, 30 of which match this
query, so the only thing I can guess is that somehow when I don't use a
"query" element at the root, Elasticsearch retrieves every document and
applies my filter, versus using some indexed approach when using query.

-Michael

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/7f96a97c-dfc9-4529-be82-d7a32fce2c7f%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

I haven’t played with it much, I tried to put everything into filters as the docs suggest, except for the full-text search. Some of my queries don’t need fulltext at all, they simply need to match a few exact terms (ProjectId, Status, UserId, Date Range).

I suppose this is an area that the query planner could improve, if it could realise one way is better than the other but produces equivalent results.

Anyways, sticking everything in filtered queries fixed it all, so, hey, win! Maybe the docs should have a small warning note ;).

-Michael

From: elasticsearch@googlegroups.com [mailto:elasticsearch@googlegroups.com] On Behalf Of Matías Waisgold
Sent: Wednesday, January 28, 2015 06:41
To: elasticsearch@googlegroups.com
Subject: Re: Massive perf difference with filter versus filtered query

We had a similar issue, because we needed the filters to apply only on the result of the query we've made so we used filtered_querys and the performance boosted about 10X. On the other hand, in my experience fields like ProjectId should be in the query section, not as a filter (of course it depends in the number of projects available).
Have you tried that?

On Wednesday, January 28, 2015 at 1:44:57 AM UTC-3, Michael Giagnocavo wrote:
I'm seeing some major performance difference depending on if I wrap my filter in a query. I don't understand, because the docs say to use filters for exact matching.

This query takes about 800ms, even after repeated executions (so caches are hot):
{ "filter": { "term": { "ProjectId": 4191152 } },
"from": 0, "size": 50,
"sort": [], "facets": {}
}

But slapping query filtered around it makes it take 5ms on repeated executions:

{ "query": { "filtered": {
"filter": { "term": { "ProjectId": 4191152 } } } },
"from": 0, "size": 50,
"sort": [], "facets": {}
}

What am I misunderstanding? I've got 80M documents, 30 of which match this query, so the only thing I can guess is that somehow when I don't use a "query" element at the root, Elasticsearch retrieves every document and applies my filter, versus using some indexed approach when using query.

-Michael

You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.commailto:elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/7f96a97c-dfc9-4529-be82-d7a32fce2c7f%40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/7f96a97c-dfc9-4529-be82-d7a32fce2c7f%40googlegroups.com?utm_medium=email&utm_source=footer.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/BLUPR07MB6746FAE015508B1C3BDABF3D4310%40BLUPR07MB674.namprd07.prod.outlook.com.
For more options, visit https://groups.google.com/d/optout.