I've been looking into writing a search application powered by
elasticsearch using user-defined polygons. I can do simple cases and
get fairly good performance, but in the cases of complex polygons(256
points for example) the search times get slower. No surprise it's
having to do more calculations. Here are my results:-
I repeated the test adding more nodes, and because it's splitting the
work out the results are coming down. The nodes are 4 core windows
machines, running a 12 shard index of about 736 documents. Is there
anyway I can optimise the search besides throwing more nodes at it?
This is an example my query:-
I've been thinking of wrapping the polygon filter in an "and" filter
with the first clause being a bounded box filter(http:// www.elasticsearch.org/guide/reference/query-dsl/geo-bounding-box-filter.html),
as this should be a faster calculation to discount hits, but I'm not
sure if the order of filters means anything. Would this help? Is there
anything else I can do?
I've been looking into writing a search application powered by
elasticsearch using user-defined polygons. I can do simple cases and
get fairly good performance, but in the cases of complex polygons(256
points for example) the search times get slower. No surprise it's
having to do more calculations. Here are my results:-
I repeated the test adding more nodes, and because it's splitting the
work out the results are coming down. The nodes are 4 core windows
machines, running a 12 shard index of about 736 documents. Is there
anyway I can optimise the search besides throwing more nodes at it?
This is an example my query:-
I've been thinking of wrapping the polygon filter in an "and" filter
with the first clause being a bounded box filter(Elasticsearch Platform — Find real-time answers at scale | Elastic...),
as this should be a faster calculation to discount hits, but I'm not
sure if the order of filters means anything. Would this help? Is there
anything else I can do?
I got my test results to be even better, er by moving the test runner
to be closer to the cluster, but also adding a bounded box improved
the results further
With the exception of multiple polygon filters where the bounded box
actually made the query time longer. That makes the structure of the
filters:- and(bbox, or(poly1, poly2, poly3)), as opposed to single
polygons which are and(bbox, poly1) and I guess that extra level takes
it toll in the calculation.
I've been looking into writing a search application powered by
elasticsearch using user-defined polygons. I can do simple cases and
get fairly good performance, but in the cases of complex polygons(256
points for example) the search times get slower. No surprise it's
having to do more calculations. Here are my results:-
I repeated the test adding more nodes, and because it's splitting the
work out the results are coming down. The nodes are 4 core windows
machines, running a 12 shard index of about 736 documents. Is there
anyway I can optimise the search besides throwing more nodes at it?
This is an example my query:-
I've been thinking of wrapping the polygon filter in an "and" filter
with the first clause being a bounded box filter(Elasticsearch Platform — Find real-time answers at scale | Elastic......),
as this should be a faster calculation to discount hits, but I'm not
sure if the order of filters means anything. Would this help? Is there
anything else I can do?
Thanks a lot for sharing. Very interesting and useful to ES users
Stephane Bastian
On Fri, 2011-07-01 at 04:44 -0700, Ian wrote:
I got my test results to be even better, er by moving the test runner
to be closer to the cluster, but also adding a bounded box improved
the results further
With the exception of multiple polygon filters where the bounded box
actually made the query time longer. That makes the structure of the
filters:- and(bbox, or(poly1, poly2, poly3)), as opposed to single
polygons which are and(bbox, poly1) and I guess that extra level takes
it toll in the calculation.
I've been looking into writing a search application powered by
elasticsearch using user-defined polygons. I can do simple cases and
get fairly good performance, but in the cases of complex polygons(256
points for example) the search times get slower. No surprise it's
having to do more calculations. Here are my results:-
I repeated the test adding more nodes, and because it's splitting the
work out the results are coming down. The nodes are 4 core windows
machines, running a 12 shard index of about 736 documents. Is there
anyway I can optimise the search besides throwing more nodes at it?
This is an example my query:-
I've been thinking of wrapping the polygon filter in an "and" filter
with the first clause being a bounded box filter(Elasticsearch Platform — Find real-time answers at scale | Elastic......),
as this should be a faster calculation to discount hits, but I'm not
sure if the order of filters means anything. Would this help? Is there
anything else I can do?
Yea, thanks a lot for the effort!. Yea, the computation is heavy as the number of points grow, might give it another round of check to see if maybe it can be further optimized...
On Friday, July 1, 2011 at 4:07 PM, stephane wrote:
Hi Ian,
Thanks a lot for sharing. Very interesting and useful to ES users
Stephane Bastian
On Fri, 2011-07-01 at 04:44 -0700, Ian wrote:
I got my test results to be even better, er by moving the test runner
to be closer to the cluster, but also adding a bounded box improved
the results further
With the exception of multiple polygon filters where the bounded box
actually made the query time longer. That makes the structure of the
filters:- and(bbox, or(poly1, poly2, poly3)), as opposed to single
polygons which are and(bbox, poly1) and I guess that extra level takes
it toll in the calculation.
I've been looking into writing a search application powered by
elasticsearch using user-defined polygons. I can do simple cases and
get fairly good performance, but in the cases of complex polygons(256
points for example) the search times get slower. No surprise it's
having to do more calculations. Here are my results:-
I repeated the test adding more nodes, and because it's splitting the
work out the results are coming down. The nodes are 4 core windows
machines, running a 12 shard index of about 736 documents. Is there
anyway I can optimise the search besides throwing more nodes at it?
This is an example my query:-
I've been thinking of wrapping the polygon filter in an "and" filter
with the first clause being a bounded box filter(Elasticsearch Platform — Find real-time answers at scale | Elastic......),
as this should be a faster calculation to discount hits, but I'm not
sure if the order of filters means anything. Would this help? Is there
anything else I can do?
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.