Filtering *before* a query


(Shawn O'Banion) #1

Hello,

This question relates to the order of execute of filters and queries.

I have two types of search criterion:

  1. A "terms" query with a few hundred terms. <------- (this is obviously
    a very expensive query)
  2. A "geo_bounding_box" filter. <------------------------(this should be
    fast and greatly reduce the result set)

Ideally, I would like to, first, use the "geo_bounding_box" filter to
reduce the result set, and then query the filtered documents with the
"terms" query.

Unfortunately, the search time does not appear to be affected by the size
of the geographic boundary (i.e., the number of documents that match the
filter). This makes me believe that it is executing the "terms" filter on
the entire index before filtering.

I've tried the following different queries:

Query #1 (average search time: 2.2s)

{
'query' : {
'terms' : { 'text' : [...]}
},
'filter' : {
'geo_bounding_box' : { ... }
}
}

From what I have read, a filter will execute before a query if it is
nested in a 'filtered' query (sourcehttps://groups.google.com/forum/#!searchin/elasticsearch/filter$20query$20execution$20order/elasticsearch/aL9fzeyTBKE/8vVKuJaNvWUJ).
However, in my case, the average search time for Query #2 actually *doubles
*for some reason:

Query #2 (average search time: 4.4s)

{
'query' : {
'filtered' : {
'query' : {
'terms' : { 'text' : [...]}
},
'filter' : {
'geo_bounding_box' : { ... }
}
}
}
}

Any suggestions are appreciated on how I might execute the fast filter
before the expensive terms query. Thank you!

Shawn

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/25520ad2-68c8-4f26-81fc-75cdf2430d77%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(Ivan Brusic) #2

I have never used the geo features, so I could be wrong, but I believe that
geo filters are expensive are should be used as post filters:
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-request-post-filter.html

One of the reasons is that geo filters are not cached by default since they
tend to be more dynamic. If your use case allows it, trying using a cached
geo filter with the filtered query.

Cheers,

Ivan

On Tue, May 27, 2014 at 12:55 PM, Shawn O'Banion shawn.obanion@gmail.comwrote:

Hello,

This question relates to the order of execute of filters and queries.

I have two types of search criterion:

  1. A "terms" query with a few hundred terms. <------- (this is
    obviously a very expensive query)
  2. A "geo_bounding_box" filter. <------------------------(this should
    be fast and greatly reduce the result set)

Ideally, I would like to, first, use the "geo_bounding_box" filter to
reduce the result set, and then query the filtered documents with the
"terms" query.

Unfortunately, the search time does not appear to be affected by the size
of the geographic boundary (i.e., the number of documents that match the
filter). This makes me believe that it is executing the "terms" filter on
the entire index before filtering.

I've tried the following different queries:

Query #1 (average search time: 2.2s)

{
'query' : {
'terms' : { 'text' : [...]}
},
'filter' : {
'geo_bounding_box' : { ... }
}
}

From what I have read, a filter will execute before a query if it is
nested in a 'filtered' query (sourcehttps://groups.google.com/forum/#!searchin/elasticsearch/filter$20query$20execution$20order/elasticsearch/aL9fzeyTBKE/8vVKuJaNvWUJ).
However, in my case, the average search time for Query #2 actually *doubles
*for some reason:

Query #2 (average search time: 4.4s)

{
'query' : {
'filtered' : {
'query' : {
'terms' : { 'text' : [...]}
},
'filter' : {
'geo_bounding_box' : { ... }
}
}
}
}

Any suggestions are appreciated on how I might execute the fast filter
before the expensive terms query. Thank you!

Shawn

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/25520ad2-68c8-4f26-81fc-75cdf2430d77%40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/25520ad2-68c8-4f26-81fc-75cdf2430d77%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQC90WNa5-1WyPjo3zrQhvuRsjpruBGdprOkw0ijCx7JFw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(Shawn O'Banion) #3

Ivan,

Thanks for the response.

Even if the geo filter is expensive, it is certainly cheaper than the
large terms query that I would like to execute after the filter. And I
don't believe that cacheing the filter result would be helpful because it
is dynamic, as you said.

Shawn

On Tuesday, May 27, 2014 4:32:08 PM UTC-5, Ivan Brusic wrote:

I have never used the geo features, so I could be wrong, but I believe
that geo filters are expensive are should be used as post filters:
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-request-post-filter.html

One of the reasons is that geo filters are not cached by default since
they tend to be more dynamic. If your use case allows it, trying using a
cached geo filter with the filtered query.

Cheers,

Ivan

On Tue, May 27, 2014 at 12:55 PM, Shawn O'Banion <shawn....@gmail.com<javascript:>

wrote:

Hello,

This question relates to the order of execute of filters and queries.

I have two types of search criterion:

  1. A "terms" query with a few hundred terms. <------- (this is
    obviously a very expensive query)
  2. A "geo_bounding_box" filter. <------------------------(this should
    be fast and greatly reduce the result set)

Ideally, I would like to, first, use the "geo_bounding_box" filter to
reduce the result set, and then query the filtered documents with the
"terms" query.

Unfortunately, the search time does not appear to be affected by the size
of the geographic boundary (i.e., the number of documents that match the
filter). This makes me believe that it is executing the "terms" filter on
the entire index before filtering.

I've tried the following different queries:

Query #1 (average search time: 2.2s)

{
'query' : {
'terms' : { 'text' : [...]}
},
'filter' : {
'geo_bounding_box' : { ... }
}
}

From what I have read, a filter will execute before a query if it is
nested in a 'filtered' query (sourcehttps://groups.google.com/forum/#!searchin/elasticsearch/filter$20query$20execution$20order/elasticsearch/aL9fzeyTBKE/8vVKuJaNvWUJ).
However, in my case, the average search time for Query #2 actually *doubles
*for some reason:

Query #2 (average search time: 4.4s)

{
'query' : {
'filtered' : {
'query' : {
'terms' : { 'text' : [...]}
},
'filter' : {
'geo_bounding_box' : { ... }
}
}
}
}

Any suggestions are appreciated on how I might execute the fast filter
before the expensive terms query. Thank you!

Shawn

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/25520ad2-68c8-4f26-81fc-75cdf2430d77%40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/25520ad2-68c8-4f26-81fc-75cdf2430d77%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/c44d9170-2be3-43b7-a765-44d953823219%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(Adrien Grand) #4

Hi Shawn,

You can force the strategy to use in filtered_query:
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-filtered-query.html#_filter_strategy

On Tue, May 27, 2014 at 11:56 PM, Shawn O'Banion shawn.obanion@gmail.comwrote:

Ivan,

Thanks for the response.

Even if the geo filter is expensive, it is certainly cheaper than the
large terms query that I would like to execute after the filter. And I
don't believe that cacheing the filter result would be helpful because it
is dynamic, as you said.

Shawn

On Tuesday, May 27, 2014 4:32:08 PM UTC-5, Ivan Brusic wrote:

I have never used the geo features, so I could be wrong, but I believe
that geo filters are expensive are should be used as post filters:
http://www.elasticsearch.org/guide/en/elasticsearch/reference/
current/search-request-post-filter.html

One of the reasons is that geo filters are not cached by default since
they tend to be more dynamic. If your use case allows it, trying using a
cached geo filter with the filtered query.

Cheers,

Ivan

On Tue, May 27, 2014 at 12:55 PM, Shawn O'Banion shawn....@gmail.comwrote:

Hello,

This question relates to the order of execute of filters and queries.

I have two types of search criterion:

  1. A "terms" query with a few hundred terms. <------- (this is
    obviously a very expensive query)
  2. A "geo_bounding_box" filter. <------------------------(this
    should be fast and greatly reduce the result set)

Ideally, I would like to, first, use the "geo_bounding_box" filter to
reduce the result set, and then query the filtered documents with the
"terms" query.

Unfortunately, the search time does not appear to be affected by the
size of the geographic boundary (i.e., the number of documents that match
the filter). This makes me believe that it is executing the "terms" filter
on the entire index before filtering.

I've tried the following different queries:

Query #1 (average search time: 2.2s)

{
'query' : {
'terms' : { 'text' : [...]}
},
'filter' : {
'geo_bounding_box' : { ... }
}
}

From what I have read, a filter will execute before a query if it is
nested in a 'filtered' query (sourcehttps://groups.google.com/forum/#!searchin/elasticsearch/filter$20query$20execution$20order/elasticsearch/aL9fzeyTBKE/8vVKuJaNvWUJ).
However, in my case, the average search time for Query #2 actually *doubles
*for some reason:

Query #2 (average search time: 4.4s)

{
'query' : {
'filtered' : {
'query' : {
'terms' : { 'text' : [...]}
},
'filter' : {
'geo_bounding_box' : { ... }
}
}
}
}

Any suggestions are appreciated on how I might execute the fast filter
before the expensive terms query. Thank you!

Shawn

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/25520ad2-68c8-4f26-81fc-75cdf2430d77%
40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/25520ad2-68c8-4f26-81fc-75cdf2430d77%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/c44d9170-2be3-43b7-a765-44d953823219%40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/c44d9170-2be3-43b7-a765-44d953823219%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
Adrien Grand

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAL6Z4j5%3D9PDMHTC%3DHusyAmmZBjeSh3-5mQvu1eOEACaFvKM1oA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(Shawn O'Banion) #5

Adrian,

If you look at my first post, you'll see that I tried a filtered query, and
it actually resulted in worse performance. Regardless, the average search
time did not vary by the size of the geographic boundary, which I take as
evidence that it is not filtering the results before executing the query.

Shawn

On Wednesday, May 28, 2014 2:44:19 PM UTC-5, Adrien Grand wrote:

Hi Shawn,

You can force the strategy to use in filtered_query:
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-filtered-query.html#_filter_strategy

On Tue, May 27, 2014 at 11:56 PM, Shawn O'Banion <shawn....@gmail.com<javascript:>

wrote:

Ivan,

Thanks for the response.

Even if the geo filter is expensive, it is certainly cheaper than the
large terms query that I would like to execute after the filter. And I
don't believe that cacheing the filter result would be helpful because it
is dynamic, as you said.

Shawn

On Tuesday, May 27, 2014 4:32:08 PM UTC-5, Ivan Brusic wrote:

I have never used the geo features, so I could be wrong, but I believe
that geo filters are expensive are should be used as post filters:
http://www.elasticsearch.org/guide/en/elasticsearch/reference/
current/search-request-post-filter.html

One of the reasons is that geo filters are not cached by default since
they tend to be more dynamic. If your use case allows it, trying using a
cached geo filter with the filtered query.

Cheers,

Ivan

On Tue, May 27, 2014 at 12:55 PM, Shawn O'Banion shawn....@gmail.comwrote:

Hello,

This question relates to the order of execute of filters and queries.

I have two types of search criterion:

  1. A "terms" query with a few hundred terms. <------- (this is
    obviously a very expensive query)
  2. A "geo_bounding_box" filter. <------------------------(this
    should be fast and greatly reduce the result set)

Ideally, I would like to, first, use the "geo_bounding_box" filter to
reduce the result set, and then query the filtered documents with the
"terms" query.

Unfortunately, the search time does not appear to be affected by the
size of the geographic boundary (i.e., the number of documents that match
the filter). This makes me believe that it is executing the "terms" filter
on the entire index before filtering.

I've tried the following different queries:

Query #1 (average search time: 2.2s)

{
'query' : {
'terms' : { 'text' : [...]}
},
'filter' : {
'geo_bounding_box' : { ... }
}
}

From what I have read, a filter will execute before a query if it is
nested in a 'filtered' query (sourcehttps://groups.google.com/forum/#!searchin/elasticsearch/filter$20query$20execution$20order/elasticsearch/aL9fzeyTBKE/8vVKuJaNvWUJ).
However, in my case, the average search time for Query #2 actually *doubles
*for some reason:

Query #2 (average search time: 4.4s)

{
'query' : {
'filtered' : {
'query' : {
'terms' : { 'text' : [...]}
},
'filter' : {
'geo_bounding_box' : { ... }
}
}
}
}

Any suggestions are appreciated on how I might execute the fast filter
before the expensive terms query. Thank you!

Shawn

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/25520ad2-68c8-4f26-81fc-75cdf2430d77%
40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/25520ad2-68c8-4f26-81fc-75cdf2430d77%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/c44d9170-2be3-43b7-a765-44d953823219%40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/c44d9170-2be3-43b7-a765-44d953823219%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
Adrien Grand

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/6d3d2819-7f0f-48b1-b1aa-b2f8e198a95d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(Ivan Brusic) #6

Why do you think that geo_bounding_box should be fast? Since the filter is
not cached, it needs to run on every document in the index.

--
Ivan

On Wed, May 28, 2014 at 12:56 PM, Shawn O'Banion shawn.obanion@gmail.comwrote:

Adrian,

If you look at my first post, you'll see that I tried a filtered query,
and it actually resulted in worse performance. Regardless, the average
search time did not vary by the size of the geographic boundary, which I
take as evidence that it is not filtering the results before executing
the query.

Shawn

On Wednesday, May 28, 2014 2:44:19 PM UTC-5, Adrien Grand wrote:

Hi Shawn,

You can force the strategy to use in filtered_query:
http://www.elasticsearch.org/guide/en/elasticsearch/
reference/current/query-dsl-filtered-query.html#_filter_strategy

On Tue, May 27, 2014 at 11:56 PM, Shawn O'Banion shawn....@gmail.comwrote:

Ivan,

Thanks for the response.

Even if the geo filter is expensive, it is certainly cheaper than the
large terms query that I would like to execute after the filter. And
I don't believe that cacheing the filter result would be helpful because it
is dynamic, as you said.

Shawn

On Tuesday, May 27, 2014 4:32:08 PM UTC-5, Ivan Brusic wrote:

I have never used the geo features, so I could be wrong, but I believe
that geo filters are expensive are should be used as post filters:
http://www.elasticsearch.org/guide/en/elasticsearch/reference/
current/search-request-post-filter.html

One of the reasons is that geo filters are not cached by default since
they tend to be more dynamic. If your use case allows it, trying using a
cached geo filter with the filtered query.

Cheers,

Ivan

On Tue, May 27, 2014 at 12:55 PM, Shawn O'Banion shawn....@gmail.comwrote:

Hello,

This question relates to the order of execute of filters and queries.

I have two types of search criterion:

  1. A "terms" query with a few hundred terms. <------- (this is
    obviously a very expensive query)
  2. A "geo_bounding_box" filter. <------------------------(this
    should be fast and greatly reduce the result set)

Ideally, I would like to, first, use the "geo_bounding_box" filter to
reduce the result set, and then query the filtered documents with the
"terms" query.

Unfortunately, the search time does not appear to be affected by the
size of the geographic boundary (i.e., the number of documents that match
the filter). This makes me believe that it is executing the "terms" filter
on the entire index before filtering.

I've tried the following different queries:

Query #1 (average search time: 2.2s)

{
'query' : {
'terms' : { 'text' : [...]}
},
'filter' : {
'geo_bounding_box' : { ... }
}
}

From what I have read, a filter will execute before a query if it
is nested in a 'filtered' query (sourcehttps://groups.google.com/forum/#!searchin/elasticsearch/filter$20query$20execution$20order/elasticsearch/aL9fzeyTBKE/8vVKuJaNvWUJ).
However, in my case, the average search time for Query #2 actually *doubles
*for some reason:

Query #2 (average search time: 4.4s)

{
'query' : {
'filtered' : {
'query' : {
'terms' : { 'text' : [...]}
},
'filter' : {
'geo_bounding_box' : { ... }
}
}
}
}

Any suggestions are appreciated on how I might execute the fast filter
before the expensive terms query. Thank you!

Shawn

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/25520ad2-68c8-4f26-81fc-75cdf2430d77%40goo
glegroups.comhttps://groups.google.com/d/msgid/elasticsearch/25520ad2-68c8-4f26-81fc-75cdf2430d77%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/c44d9170-2be3-43b7-a765-44d953823219%
40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/c44d9170-2be3-43b7-a765-44d953823219%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
Adrien Grand

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/6d3d2819-7f0f-48b1-b1aa-b2f8e198a95d%40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/6d3d2819-7f0f-48b1-b1aa-b2f8e198a95d%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQBC2J_C%3DY_V0Go%2BgC0aabDs0RScyQ6RSEU%2Bs5XjNOXbAg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(Shawn O'Banion) #7

Because when I remove the terms query with just the geo_bounding_box filter
remaining, the search time is around 400ms on average. If I add the terms
query, it is around 2200ms on average.

Again, relative to the terms query (which has 100's of terms), it is very
cheap.

My question is: how do I execute the geo_bounding_box filter before executing
the terms query so that I reduce the number of documents that I have to
query over.

On Wednesday, May 28, 2014 4:22:05 PM UTC-5, Ivan Brusic wrote:

Why do you think that geo_bounding_box should be fast? Since the filter is
not cached, it needs to run on every document in the index.

--
Ivan

On Wed, May 28, 2014 at 12:56 PM, Shawn O'Banion <shawn....@gmail.com<javascript:>

wrote:

Adrian,

If you look at my first post, you'll see that I tried a filtered query,
and it actually resulted in worse performance. Regardless, the average
search time did not vary by the size of the geographic boundary, which I
take as evidence that it is not filtering the results before executing
the query.

Shawn

On Wednesday, May 28, 2014 2:44:19 PM UTC-5, Adrien Grand wrote:

Hi Shawn,

You can force the strategy to use in filtered_query:
http://www.elasticsearch.org/guide/en/elasticsearch/
reference/current/query-dsl-filtered-query.html#_filter_strategy

On Tue, May 27, 2014 at 11:56 PM, Shawn O'Banion shawn....@gmail.comwrote:

Ivan,

Thanks for the response.

Even if the geo filter is expensive, it is certainly cheaper than
the large terms query that I would like to execute after the filter.
And I don't believe that cacheing the filter result would be helpful
because it is dynamic, as you said.

Shawn

On Tuesday, May 27, 2014 4:32:08 PM UTC-5, Ivan Brusic wrote:

I have never used the geo features, so I could be wrong, but I believe
that geo filters are expensive are should be used as post filters:
http://www.elasticsearch.org/guide/en/elasticsearch/reference/
current/search-request-post-filter.html

One of the reasons is that geo filters are not cached by default since
they tend to be more dynamic. If your use case allows it, trying using a
cached geo filter with the filtered query.

Cheers,

Ivan

On Tue, May 27, 2014 at 12:55 PM, Shawn O'Banion shawn....@gmail.comwrote:

Hello,

This question relates to the order of execute of filters and queries.

I have two types of search criterion:

  1. A "terms" query with a few hundred terms. <------- (this is
    obviously a very expensive query)
  2. A "geo_bounding_box" filter. <------------------------(this
    should be fast and greatly reduce the result set)

Ideally, I would like to, first, use the "geo_bounding_box" filter to
reduce the result set, and then query the filtered documents with the
"terms" query.

Unfortunately, the search time does not appear to be affected by the
size of the geographic boundary (i.e., the number of documents that match
the filter). This makes me believe that it is executing the "terms" filter
on the entire index before filtering.

I've tried the following different queries:

Query #1 (average search time: 2.2s)

{
'query' : {
'terms' : { 'text' : [...]}
},
'filter' : {
'geo_bounding_box' : { ... }
}
}

From what I have read, a filter will execute before a query if it
is nested in a 'filtered' query (sourcehttps://groups.google.com/forum/#!searchin/elasticsearch/filter$20query$20execution$20order/elasticsearch/aL9fzeyTBKE/8vVKuJaNvWUJ).
However, in my case, the average search time for Query #2 actually *doubles
*for some reason:

Query #2 (average search time: 4.4s)

{
'query' : {
'filtered' : {
'query' : {
'terms' : { 'text' : [...]}
},
'filter' : {
'geo_bounding_box' : { ... }
}
}
}
}

Any suggestions are appreciated on how I might execute the fast
filter before the expensive terms query. Thank you!

Shawn

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to elasticsearc...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/25520ad2-68c8-4f26-81fc-75cdf2430d77%40goo
glegroups.comhttps://groups.google.com/d/msgid/elasticsearch/25520ad2-68c8-4f26-81fc-75cdf2430d77%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/c44d9170-2be3-43b7-a765-44d953823219%
40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/c44d9170-2be3-43b7-a765-44d953823219%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
Adrien Grand

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/6d3d2819-7f0f-48b1-b1aa-b2f8e198a95d%40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/6d3d2819-7f0f-48b1-b1aa-b2f8e198a95d%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/dc53c86c-c747-4df7-b0ae-9df3857daad6%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(Ivan Brusic) #8

As I mentioned initially, I have no experience with the geo features, so I
am just guessing regarding performance.

So a filtered query with a match all query and the geo filter is only
400ms? What about just the term query? Not sure how you are executing
tests, but you should make sure all the caches are warmed and ready to go.
Initial caching can wreck havoc on queries. Are the field data and filter
caches constant during the queries?

--
Ivan

On Wed, May 28, 2014 at 2:29 PM, Shawn O'Banion shawn.obanion@gmail.comwrote:

Because when I remove the terms query with just the geo_bounding_box
filter remaining, the search time is around 400ms on average. If I add the
terms query, it is around 2200ms on average.

Again, relative to the terms query (which has 100's of terms), it is
very cheap.

My question is: how do I execute the geo_bounding_box filter before executing
the terms query so that I reduce the number of documents that I have to
query over.

On Wednesday, May 28, 2014 4:22:05 PM UTC-5, Ivan Brusic wrote:

Why do you think that geo_bounding_box should be fast? Since the filter
is not cached, it needs to run on every document in the index.

--
Ivan

On Wed, May 28, 2014 at 12:56 PM, Shawn O'Banion shawn....@gmail.comwrote:

Adrian,

If you look at my first post, you'll see that I tried a filtered query,
and it actually resulted in worse performance. Regardless, the
average search time did not vary by the size of the geographic boundary,
which I take as evidence that it is not filtering the results before
executing the query.

Shawn

On Wednesday, May 28, 2014 2:44:19 PM UTC-5, Adrien Grand wrote:

Hi Shawn,

You can force the strategy to use in filtered_query:
http://www.elasticsearch.org/guide/en/elasticsearch/referenc
e/current/query-dsl-filtered-query.html#_filter_strategy

On Tue, May 27, 2014 at 11:56 PM, Shawn O'Banion shawn....@gmail.comwrote:

Ivan,

Thanks for the response.

Even if the geo filter is expensive, it is certainly cheaper than
the large terms query that I would like to execute after the
filter. And I don't believe that cacheing the filter result would be
helpful because it is dynamic, as you said.

Shawn

On Tuesday, May 27, 2014 4:32:08 PM UTC-5, Ivan Brusic wrote:

I have never used the geo features, so I could be wrong, but I
believe that geo filters are expensive are should be used as post filters:
http://www.elasticsearch.org/guide/en/elasticsearch/reference/
current/search-request-post-filter.html

One of the reasons is that geo filters are not cached by default
since they tend to be more dynamic. If your use case allows it, trying
using a cached geo filter with the filtered query.

Cheers,

Ivan

On Tue, May 27, 2014 at 12:55 PM, Shawn O'Banion <shawn....@gmail.com

wrote:

Hello,

This question relates to the order of execute of filters and queries.

I have two types of search criterion:

  1. A "terms" query with a few hundred terms. <------- (this is
    obviously a very expensive query)
  2. A "geo_bounding_box" filter. <------------------------(this
    should be fast and greatly reduce the result set)

Ideally, I would like to, first, use the "geo_bounding_box" filter
to reduce the result set, and then query the filtered documents with the
"terms" query.

Unfortunately, the search time does not appear to be affected by the
size of the geographic boundary (i.e., the number of documents that match
the filter). This makes me believe that it is executing the "terms" filter
on the entire index before filtering.

I've tried the following different queries:

Query #1 (average search time: 2.2s)

{
'query' : {
'terms' : { 'text' : [...]}
},
'filter' : {
'geo_bounding_box' : { ... }
}
}

From what I have read, a filter will execute before a query if it
is nested in a 'filtered' query (sourcehttps://groups.google.com/forum/#!searchin/elasticsearch/filter$20query$20execution$20order/elasticsearch/aL9fzeyTBKE/8vVKuJaNvWUJ).
However, in my case, the average search time for Query #2 actually *doubles
*for some reason:

Query #2 (average search time: 4.4s)

{
'query' : {
'filtered' : {
'query' : {
'terms' : { 'text' : [...]}
},
'filter' : {
'geo_bounding_box' : { ... }
}
}
}
}

Any suggestions are appreciated on how I might execute the fast
filter before the expensive terms query. Thank you!

Shawn

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to elasticsearc...@googlegroups.com.

To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/25520ad2-68c
8-4f26-81fc-75cdf2430d77%40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/25520ad2-68c8-4f26-81fc-75cdf2430d77%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/c44d9170-2be3-43b7-a765-44d953823219%40goo
glegroups.comhttps://groups.google.com/d/msgid/elasticsearch/c44d9170-2be3-43b7-a765-44d953823219%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
Adrien Grand

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/6d3d2819-7f0f-48b1-b1aa-b2f8e198a95d%
40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/6d3d2819-7f0f-48b1-b1aa-b2f8e198a95d%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/dc53c86c-c747-4df7-b0ae-9df3857daad6%40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/dc53c86c-c747-4df7-b0ae-9df3857daad6%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQA-_20UR1orPwP%2BgkCZ1DChLv5iKZA_PzNwSKX8KN%2BWAA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(Adrien Grand) #9

Hi Shawn,

On Wed, May 28, 2014 at 11:29 PM, Shawn O'Banion shawn.obanion@gmail.com
wrote:

My question is: how do I execute the geo_bounding_box filter before executing
the terms query so that I reduce the number of documents that I have to
query over.

This is why I pointed out the link about filter strategies: whether the
query is applied before, at the same time or after the filter can be
controlled using the strategy parameter of filtered_query:
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-filtered-query.html#_filter_strategy

You can try them out to see how they influence response times.

You might also want to try out the indexed type for the geo bounding
filter that might be faster than the default memory if your filter is
very selective:
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-geo-bounding-box-filter.html#_type_2

--
Adrien Grand

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAL6Z4j61-at7r_P2HRhnE473sZJzX5ZbWov76o6SYJ1n87C72g%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(Shawn O'Banion) #10

Hi Adrien,

Thanks. This sounds like what I need, however the page you link to does not
discuss the 'strategy' parameter that you mention.

I see some documentation about a filter strategy with Lucene (
http://lucene.apache.org/core/4_1_0/core/org/apache/lucene/search/FilteredQuery.FilterStrategy.html)
but I'm not sure how to specify the execution order or filter strategy in
an ElasticSearch query DSL.

Thanks,
Shawn

On Fri, May 30, 2014 at 1:05 AM, Adrien Grand <
adrien.grand@elasticsearch.com> wrote:

Hi Shawn,

On Wed, May 28, 2014 at 11:29 PM, Shawn O'Banion shawn.obanion@gmail.com
wrote:

My question is: how do I execute the geo_bounding_box filter before executing
the terms query so that I reduce the number of documents that I have to
query over.

This is why I pointed out the link about filter strategies: whether the
query is applied before, at the same time or after the filter can be
controlled using the strategy parameter of filtered_query:

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-filtered-query.html#_filter_strategy

You can try them out to see how they influence response times.

You might also want to try out the indexed type for the geo bounding
filter that might be faster than the default memory if your filter is
very selective:

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-geo-bounding-box-filter.html#_type_2

--
Adrien Grand

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/M34JeoP4r3o/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAL6Z4j61-at7r_P2HRhnE473sZJzX5ZbWov76o6SYJ1n87C72g%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAL6Z4j61-at7r_P2HRhnE473sZJzX5ZbWov76o6SYJ1n87C72g%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CADC-RJbhHqKT_Ue25KEgMxBeGTfgSYQL1%3D8EViLm06PdxHa-eA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(Ivan Brusic) #11

For some reason, when I viewed that page at work, I was not seeing the
strategy section either, but I was able to at home. Try refreshing the
page, worked for me.

Perhaps I should play around with those settings the next time I fine tune
my queries. I use a combination of both pre and post filters.

--
Ivan

On Fri, May 30, 2014 at 9:37 AM, Shawn O'Banion shawn.obanion@gmail.com
wrote:

Hi Adrien,

Thanks. This sounds like what I need, however the page you link to does
not discuss the 'strategy' parameter that you mention.

I see some documentation about a filter strategy with Lucene (
http://lucene.apache.org/core/4_1_0/core/org/apache/lucene/search/FilteredQuery.FilterStrategy.html)
but I'm not sure how to specify the execution order or filter strategy in
an ElasticSearch query DSL.

Thanks,
Shawn

On Fri, May 30, 2014 at 1:05 AM, Adrien Grand <
adrien.grand@elasticsearch.com> wrote:

Hi Shawn,

On Wed, May 28, 2014 at 11:29 PM, Shawn O'Banion <shawn.obanion@gmail.com

wrote:

My question is: how do I execute the geo_bounding_box filter before executing
the terms query so that I reduce the number of documents that I have to
query over.

This is why I pointed out the link about filter strategies: whether the
query is applied before, at the same time or after the filter can be
controlled using the strategy parameter of filtered_query:

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-filtered-query.html#_filter_strategy

You can try them out to see how they influence response times.

You might also want to try out the indexed type for the geo bounding
filter that might be faster than the default memory if your filter is
very selective:

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-geo-bounding-box-filter.html#_type_2

--
Adrien Grand

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/M34JeoP4r3o/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAL6Z4j61-at7r_P2HRhnE473sZJzX5ZbWov76o6SYJ1n87C72g%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAL6Z4j61-at7r_P2HRhnE473sZJzX5ZbWov76o6SYJ1n87C72g%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CADC-RJbhHqKT_Ue25KEgMxBeGTfgSYQL1%3D8EViLm06PdxHa-eA%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CADC-RJbhHqKT_Ue25KEgMxBeGTfgSYQL1%3D8EViLm06PdxHa-eA%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQChpaOA3MmY58SzWgu%3Dqy92W_xPb5C%3D7zS5cvEFPta4iw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(Shawn O'Banion) #12

Interesting, when I hit refresh I can see the strategy section.

Nevertheless, it seems that when I set the 'strategy' equal to
'leap_frog_filter_first' I achieve better performance. This seems to be
what I needed. Thank you!

On Fri, May 30, 2014 at 11:45 AM, Ivan Brusic ivan@brusic.com wrote:

For some reason, when I viewed that page at work, I was not seeing the
strategy section either, but I was able to at home. Try refreshing the
page, worked for me.

Perhaps I should play around with those settings the next time I fine tune
my queries. I use a combination of both pre and post filters.

--
Ivan

On Fri, May 30, 2014 at 9:37 AM, Shawn O'Banion shawn.obanion@gmail.com
wrote:

Hi Adrien,

Thanks. This sounds like what I need, however the page you link to does
not discuss the 'strategy' parameter that you mention.

I see some documentation about a filter strategy with Lucene (
http://lucene.apache.org/core/4_1_0/core/org/apache/lucene/search/FilteredQuery.FilterStrategy.html)
but I'm not sure how to specify the execution order or filter strategy in
an ElasticSearch query DSL.

Thanks,
Shawn

On Fri, May 30, 2014 at 1:05 AM, Adrien Grand <
adrien.grand@elasticsearch.com> wrote:

Hi Shawn,

On Wed, May 28, 2014 at 11:29 PM, Shawn O'Banion <
shawn.obanion@gmail.com> wrote:

My question is: how do I execute the geo_bounding_box filter before executing
the terms query so that I reduce the number of documents that I have to
query over.

This is why I pointed out the link about filter strategies: whether the
query is applied before, at the same time or after the filter can be
controlled using the strategy parameter of filtered_query:

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-filtered-query.html#_filter_strategy

You can try them out to see how they influence response times.

You might also want to try out the indexed type for the geo bounding
filter that might be faster than the default memory if your filter is
very selective:

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-geo-bounding-box-filter.html#_type_2

--
Adrien Grand

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/M34JeoP4r3o/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAL6Z4j61-at7r_P2HRhnE473sZJzX5ZbWov76o6SYJ1n87C72g%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAL6Z4j61-at7r_P2HRhnE473sZJzX5ZbWov76o6SYJ1n87C72g%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CADC-RJbhHqKT_Ue25KEgMxBeGTfgSYQL1%3D8EViLm06PdxHa-eA%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CADC-RJbhHqKT_Ue25KEgMxBeGTfgSYQL1%3D8EViLm06PdxHa-eA%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/M34JeoP4r3o/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQChpaOA3MmY58SzWgu%3Dqy92W_xPb5C%3D7zS5cvEFPta4iw%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQChpaOA3MmY58SzWgu%3Dqy92W_xPb5C%3D7zS5cvEFPta4iw%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CADC-RJazgs7ORnuuqVYkXqZuaYyAWLCVynkQkY0dtry7KfpJzg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(system) #13