Geopolygon testing

ian_clark · June 30, 2011, 8:30am

Hi,

I've been looking into writing a search application powered by
elasticsearch using user-defined polygons. I can do simple cases and
get fairly good performance, but in the cases of complex polygons(256
points for example) the search times get slower. No surprise it's
having to do more calculations. Here are my results:-

https://s3-eu-west-1.amazonaws.com/es-results/ComplexPolygonResults-VaryingNodes.png

I repeated the test adding more nodes, and because it's splitting the
work out the results are coming down. The nodes are 4 core windows
machines, running a 12 shard index of about 736 documents. Is there
anyway I can optimise the search besides throwing more nodes at it?
This is an example my query:-

gist.github.com

https://gist.github.com/ianAndrewClark/1055826

gistfile1.js

{
  "from": 0,
  "size": 100,
  "facets": {
    "postalsector": {
      "terms": {
        "field": "postalsector",
        "order": "count",
        "size": 5
      }

This file has been truncated. show original

I've been thinking of wrapping the polygon filter in an "and" filter
with the first clause being a bounded box filter(http://
www.elasticsearch.org/guide/reference/query-dsl/geo-bounding-box-filter.html),
as this should be a faster calculation to discount hits, but I'm not
sure if the order of filters means anything. Would this help? Is there
anything else I can do?

cheers

Ian

ian_clark · June 30, 2011, 8:32am

sorry that is 736k documents... not 736.

On Jun 30, 9:30 am, Ian ian.andrew.cl...@gmail.com wrote:

Hi,

I've been looking into writing a search application powered by
elasticsearch using user-defined polygons. I can do simple cases and
get fairly good performance, but in the cases of complex polygons(256
points for example) the search times get slower. No surprise it's
having to do more calculations. Here are my results:-

https://s3-eu-west-1.amazonaws.com/es-results/ComplexPolygonResults-V...

I repeated the test adding more nodes, and because it's splitting the
work out the results are coming down. The nodes are 4 core windows
machines, running a 12 shard index of about 736 documents. Is there
anyway I can optimise the search besides throwing more nodes at it?
This is an example my query:-

3 times 10 point polygon query · GitHub

I've been thinking of wrapping the polygon filter in an "and" filter
with the first clause being a bounded box filter(Elasticsearch Platform — Find real-time answers at scale | Elastic...),
as this should be a faster calculation to discount hits, but I'm not
sure if the order of filters means anything. Would this help? Is there
anything else I can do?

cheers

Ian

ian_clark · July 1, 2011, 11:44am

I got my test results to be even better, er by moving the test runner
to be closer to the cluster, but also adding a bounded box improved
the results further

https://s3-eu-west-1.amazonaws.com/es-results/ComplexPolygonResults-CloserWithBounded.png

With the exception of multiple polygon filters where the bounded box
actually made the query time longer. That makes the structure of the
filters:- and(bbox, or(poly1, poly2, poly3)), as opposed to single
polygons which are and(bbox, poly1) and I guess that extra level takes
it toll in the calculation.

My queries now look like this:-

gist.github.com

https://gist.github.com/ianAndrewClark/1058361

gistfile1.js

{
    "from": 0,
    "size": 100,
    "facets": {
        "outcodes": {
            "terms": {
                "field": "postalsector",
                "order": "count",
                "size": 5
            }

This file has been truncated. show original

IC

On Jun 30, 9:32 am, Ian ian.andrew.cl...@gmail.com wrote:

sorry that is 736k documents... not 736.

On Jun 30, 9:30 am, Ian ian.andrew.cl...@gmail.com wrote:

Hi,

I've been looking into writing a search application powered by
elasticsearch using user-defined polygons. I can do simple cases and
get fairly good performance, but in the cases of complex polygons(256
points for example) the search times get slower. No surprise it's
having to do more calculations. Here are my results:-

https://s3-eu-west-1.amazonaws.com/es-results/ComplexPolygonResults-V...

I repeated the test adding more nodes, and because it's splitting the
work out the results are coming down. The nodes are 4 core windows
machines, running a 12 shard index of about 736 documents. Is there
anyway I can optimise the search besides throwing more nodes at it?
This is an example my query:-

3 times 10 point polygon query · GitHub

I've been thinking of wrapping the polygon filter in an "and" filter
with the first clause being a bounded box filter(Elasticsearch Platform — Find real-time answers at scale | Elastic......),
as this should be a faster calculation to discount hits, but I'm not
sure if the order of filters means anything. Would this help? Is there
anything else I can do?

cheers

Ian

Stephane_Bastian · July 1, 2011, 1:07pm

Hi Ian,

Thanks a lot for sharing. Very interesting and useful to ES users

Stephane Bastian

On Fri, 2011-07-01 at 04:44 -0700, Ian wrote:

I got my test results to be even better, er by moving the test runner
to be closer to the cluster, but also adding a bounded box improved
the results further

https://s3-eu-west-1.amazonaws.com/es-results/ComplexPolygonResults-CloserWithBounded.png

With the exception of multiple polygon filters where the bounded box
actually made the query time longer. That makes the structure of the
filters:- and(bbox, or(poly1, poly2, poly3)), as opposed to single
polygons which are and(bbox, poly1) and I guess that extra level takes
it toll in the calculation.

My queries now look like this:-

3 point polygon with bounded box query · GitHub

IC

On Jun 30, 9:32 am, Ian ian.andrew.cl...@gmail.com wrote:

sorry that is 736k documents... not 736.

On Jun 30, 9:30 am, Ian ian.andrew.cl...@gmail.com wrote:

Hi,

I've been looking into writing a search application powered by
elasticsearch using user-defined polygons. I can do simple cases and
get fairly good performance, but in the cases of complex polygons(256
points for example) the search times get slower. No surprise it's
having to do more calculations. Here are my results:-

https://s3-eu-west-1.amazonaws.com/es-results/ComplexPolygonResults-V...

I repeated the test adding more nodes, and because it's splitting the
work out the results are coming down. The nodes are 4 core windows
machines, running a 12 shard index of about 736 documents. Is there
anyway I can optimise the search besides throwing more nodes at it?
This is an example my query:-

3 times 10 point polygon query · GitHub

I've been thinking of wrapping the polygon filter in an "and" filter
with the first clause being a bounded box filter(Elasticsearch Platform — Find real-time answers at scale | Elastic......),
as this should be a faster calculation to discount hits, but I'm not
sure if the order of filters means anything. Would this help? Is there
anything else I can do?

cheers

Ian

kimchy · July 2, 2011, 8:46pm

Yea, thanks a lot for the effort!. Yea, the computation is heavy as the number of points grow, might give it another round of check to see if maybe it can be further optimized...

On Friday, July 1, 2011 at 4:07 PM, stephane wrote:

Hi Ian,

Thanks a lot for sharing. Very interesting and useful to ES users

Stephane Bastian

On Fri, 2011-07-01 at 04:44 -0700, Ian wrote:

I got my test results to be even better, er by moving the test runner
to be closer to the cluster, but also adding a bounded box improved
the results further

https://s3-eu-west-1.amazonaws.com/es-results/ComplexPolygonResults-CloserWithBounded.png

With the exception of multiple polygon filters where the bounded box
actually made the query time longer. That makes the structure of the
filters:- and(bbox, or(poly1, poly2, poly3)), as opposed to single
polygons which are and(bbox, poly1) and I guess that extra level takes
it toll in the calculation.

My queries now look like this:-

3 point polygon with bounded box query · GitHub

IC

On Jun 30, 9:32 am, Ian <ian.andrew.cl...@gmail.com (http://gmail.com)> wrote:

sorry that is 736k documents... not 736.

On Jun 30, 9:30 am, Ian <ian.andrew.cl...@gmail.com (http://gmail.com)> wrote:

Hi,

I've been looking into writing a search application powered by
elasticsearch using user-defined polygons. I can do simple cases and
get fairly good performance, but in the cases of complex polygons(256
points for example) the search times get slower. No surprise it's
having to do more calculations. Here are my results:-

https://s3-eu-west-1.amazonaws.com/es-results/ComplexPolygonResults-V...

I repeated the test adding more nodes, and because it's splitting the
work out the results are coming down. The nodes are 4 core windows
machines, running a 12 shard index of about 736 documents. Is there
anyway I can optimise the search besides throwing more nodes at it?
This is an example my query:-

3 times 10 point polygon query · GitHub

I've been thinking of wrapping the polygon filter in an "and" filter
with the first clause being a bounded box filter(Elasticsearch Platform — Find real-time answers at scale | Elastic......),
as this should be a faster calculation to discount hits, but I'm not
sure if the order of filters means anything. Would this help? Is there
anything else I can do?

cheers

Ian

Topic		Replies	Views
Elasticsearch polygon filter performance Elasticsearch	1	285	July 6, 2017
Elasticsearch polygon filter performance Elasticsearch	1	306	July 6, 2017
Geo_shape size and elastic performance Elasticsearch	5	789	October 12, 2021
Geoshape Indexing Painfully Slow with Polygon Elasticsearch	2	908	July 5, 2017
Elastic Performance for Spatial Queries Slows Down when Geometries are in Millions Elasticsearch	2	159	February 27, 2024

Geopolygon testing

Related topics