Geopolygon testing

Hi,

I've been looking into writing a search application powered by
elasticsearch using user-defined polygons. I can do simple cases and
get fairly good performance, but in the cases of complex polygons(256
points for example) the search times get slower. No surprise it's
having to do more calculations. Here are my results:-

https://s3-eu-west-1.amazonaws.com/es-results/ComplexPolygonResults-VaryingNodes.png

I repeated the test adding more nodes, and because it's splitting the
work out the results are coming down. The nodes are 4 core windows
machines, running a 12 shard index of about 736 documents. Is there
anyway I can optimise the search besides throwing more nodes at it?
This is an example my query:-

I've been thinking of wrapping the polygon filter in an "and" filter
with the first clause being a bounded box filter(http://
www.elasticsearch.org/guide/reference/query-dsl/geo-bounding-box-filter.html),
as this should be a faster calculation to discount hits, but I'm not
sure if the order of filters means anything. Would this help? Is there
anything else I can do?

cheers

Ian

sorry that is 736k documents... not 736.

On Jun 30, 9:30 am, Ian ian.andrew.cl...@gmail.com wrote:

Hi,

I've been looking into writing a search application powered by
elasticsearch using user-defined polygons. I can do simple cases and
get fairly good performance, but in the cases of complex polygons(256
points for example) the search times get slower. No surprise it's
having to do more calculations. Here are my results:-

https://s3-eu-west-1.amazonaws.com/es-results/ComplexPolygonResults-V...

I repeated the test adding more nodes, and because it's splitting the
work out the results are coming down. The nodes are 4 core windows
machines, running a 12 shard index of about 736 documents. Is there
anyway I can optimise the search besides throwing more nodes at it?
This is an example my query:-

3 times 10 point polygon query · GitHub

I've been thinking of wrapping the polygon filter in an "and" filter
with the first clause being a bounded box filter(Elasticsearch Platform — Find real-time answers at scale | Elastic...),
as this should be a faster calculation to discount hits, but I'm not
sure if the order of filters means anything. Would this help? Is there
anything else I can do?

cheers

Ian

I got my test results to be even better, er by moving the test runner
to be closer to the cluster, but also adding a bounded box improved
the results further

https://s3-eu-west-1.amazonaws.com/es-results/ComplexPolygonResults-CloserWithBounded.png

With the exception of multiple polygon filters where the bounded box
actually made the query time longer. That makes the structure of the
filters:- and(bbox, or(poly1, poly2, poly3)), as opposed to single
polygons which are and(bbox, poly1) and I guess that extra level takes
it toll in the calculation.

My queries now look like this:-

IC

On Jun 30, 9:32 am, Ian ian.andrew.cl...@gmail.com wrote:

sorry that is 736k documents... not 736.

On Jun 30, 9:30 am, Ian ian.andrew.cl...@gmail.com wrote:

Hi,

I've been looking into writing a search application powered by
elasticsearch using user-defined polygons. I can do simple cases and
get fairly good performance, but in the cases of complex polygons(256
points for example) the search times get slower. No surprise it's
having to do more calculations. Here are my results:-

https://s3-eu-west-1.amazonaws.com/es-results/ComplexPolygonResults-V...

I repeated the test adding more nodes, and because it's splitting the
work out the results are coming down. The nodes are 4 core windows
machines, running a 12 shard index of about 736 documents. Is there
anyway I can optimise the search besides throwing more nodes at it?
This is an example my query:-

3 times 10 point polygon query · GitHub

I've been thinking of wrapping the polygon filter in an "and" filter
with the first clause being a bounded box filter(Elasticsearch Platform — Find real-time answers at scale | Elastic......),
as this should be a faster calculation to discount hits, but I'm not
sure if the order of filters means anything. Would this help? Is there
anything else I can do?

cheers

Ian

Hi Ian,

Thanks a lot for sharing. Very interesting and useful to ES users

Stephane Bastian

On Fri, 2011-07-01 at 04:44 -0700, Ian wrote:

I got my test results to be even better, er by moving the test runner
to be closer to the cluster, but also adding a bounded box improved
the results further

https://s3-eu-west-1.amazonaws.com/es-results/ComplexPolygonResults-CloserWithBounded.png

With the exception of multiple polygon filters where the bounded box
actually made the query time longer. That makes the structure of the
filters:- and(bbox, or(poly1, poly2, poly3)), as opposed to single
polygons which are and(bbox, poly1) and I guess that extra level takes
it toll in the calculation.

My queries now look like this:-

3 point polygon with bounded box query · GitHub

IC

On Jun 30, 9:32 am, Ian ian.andrew.cl...@gmail.com wrote:

sorry that is 736k documents... not 736.

On Jun 30, 9:30 am, Ian ian.andrew.cl...@gmail.com wrote:

Hi,

I've been looking into writing a search application powered by
elasticsearch using user-defined polygons. I can do simple cases and
get fairly good performance, but in the cases of complex polygons(256
points for example) the search times get slower. No surprise it's
having to do more calculations. Here are my results:-

https://s3-eu-west-1.amazonaws.com/es-results/ComplexPolygonResults-V...

I repeated the test adding more nodes, and because it's splitting the
work out the results are coming down. The nodes are 4 core windows
machines, running a 12 shard index of about 736 documents. Is there
anyway I can optimise the search besides throwing more nodes at it?
This is an example my query:-

3 times 10 point polygon query · GitHub

I've been thinking of wrapping the polygon filter in an "and" filter
with the first clause being a bounded box filter(Elasticsearch Platform — Find real-time answers at scale | Elastic......),
as this should be a faster calculation to discount hits, but I'm not
sure if the order of filters means anything. Would this help? Is there
anything else I can do?

cheers

Ian

Yea, thanks a lot for the effort!. Yea, the computation is heavy as the number of points grow, might give it another round of check to see if maybe it can be further optimized...

On Friday, July 1, 2011 at 4:07 PM, stephane wrote:

Hi Ian,

Thanks a lot for sharing. Very interesting and useful to ES users

Stephane Bastian

On Fri, 2011-07-01 at 04:44 -0700, Ian wrote:

I got my test results to be even better, er by moving the test runner
to be closer to the cluster, but also adding a bounded box improved
the results further

https://s3-eu-west-1.amazonaws.com/es-results/ComplexPolygonResults-CloserWithBounded.png

With the exception of multiple polygon filters where the bounded box
actually made the query time longer. That makes the structure of the
filters:- and(bbox, or(poly1, poly2, poly3)), as opposed to single
polygons which are and(bbox, poly1) and I guess that extra level takes
it toll in the calculation.

My queries now look like this:-

3 point polygon with bounded box query · GitHub

IC

On Jun 30, 9:32 am, Ian <ian.andrew.cl...@gmail.com (http://gmail.com)> wrote:

sorry that is 736k documents... not 736.

On Jun 30, 9:30 am, Ian <ian.andrew.cl...@gmail.com (http://gmail.com)> wrote:

Hi,

I've been looking into writing a search application powered by
elasticsearch using user-defined polygons. I can do simple cases and
get fairly good performance, but in the cases of complex polygons(256
points for example) the search times get slower. No surprise it's
having to do more calculations. Here are my results:-

https://s3-eu-west-1.amazonaws.com/es-results/ComplexPolygonResults-V...

I repeated the test adding more nodes, and because it's splitting the
work out the results are coming down. The nodes are 4 core windows
machines, running a 12 shard index of about 736 documents. Is there
anyway I can optimise the search besides throwing more nodes at it?
This is an example my query:-

3 times 10 point polygon query · GitHub

I've been thinking of wrapping the polygon filter in an "and" filter
with the first clause being a bounded box filter(Elasticsearch Platform — Find real-time answers at scale | Elastic......),
as this should be a faster calculation to discount hits, but I'm not
sure if the order of filters means anything. Would this help? Is there
anything else I can do?

cheers

Ian