Complex polygons errors (geo_shape)

I've been indexing some complex polygons using the geo_shape type from
e.g. http://code.flickr.net/2011/01/08/flickr-shapefiles-public-dataset-2-0/
and https://github.com/johan/world.geo.json. Both files are in geojson
format and contain polygons and multipolygons.

These free data sets contain some shapes that Elasticsearch doesn't handle.

Particularly, I'm getting these errors on some shapes:

MapperParsingException[failed to parse [geometry]]; nested:
TopologyException[found non-noded intersection between LINESTRING (
179.9435577392578 -78.29638671875, 179.9999847412109 -78.30168151855469 )
and LINESTRING ( 180.0 -78.30162811279297, -178.9799499511719
-78.39634704589844 ) [ (179.99941723147876, -78.30
16282665598, NaN) ]];

and

MapperParsingException[failed to parse [geometry]]; nested:
InvalidShapeException[Self-intersection at or near point
(119.51048182278488, 23.36987643037978, NaN)];

I'd love to have a fix for these issues because it means I can't reliably
index my datasets currently. It's not acceptable to have countries missing
from the country dataset for example. I have a workaround, which involves
calculating a simple convex polygon from the coordinates
(http://en.wikipedia.org/wiki/Convex_and_concave_polygons) but that
inevitably includes large areas outside the original polygon and it is kind
of tedious to do this using the bulk API. I was wondering if there's a
better way to do this and whether it would be possible to integrate a
solution into Lucene directly.

Jilles

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Hi Jilles,

I can reproduce the error and working on a branch to fix the Dateline bug.

--Florian

On Monday, May 13, 2013 10:57:03 AM UTC+2, Jilles van Gurp wrote:

I've been indexing some complex polygons using the geo_shape type from
e.g.
http://code.flickr.net/2011/01/08/flickr-shapefiles-public-dataset-2-0/and
https://github.com/johan/world.geo.json. Both files are in geojson format
and contain polygons and multipolygons.

These free data sets contain some shapes that Elasticsearch doesn't handle.

Particularly, I'm getting these errors on some shapes:

MapperParsingException[failed to parse [geometry]]; nested:
TopologyException[found non-noded intersection between LINESTRING (
179.9435577392578 -78.29638671875, 179.9999847412109 -78.30168151855469 )
and LINESTRING ( 180.0 -78.30162811279297, -178.9799499511719
-78.39634704589844 ) [ (179.99941723147876, -78.30
16282665598, NaN) ]];

and

MapperParsingException[failed to parse [geometry]]; nested:
InvalidShapeException[Self-intersection at or near point
(119.51048182278488, 23.36987643037978, NaN)];

I'd love to have a fix for these issues because it means I can't reliably
index my datasets currently. It's not acceptable to have countries missing
from the country dataset for example. I have a workaround, which involves
calculating a simple convex polygon from the coordinates (
http://en.wikipedia.org/wiki/Convex_and_concave_polygons) but that
inevitably includes large areas outside the original polygon and it is kind
of tedious to do this using the bulk API. I was wondering if there's a
better way to do this and whether it would be possible to integrate a
solution into Lucene directly.

Jilles

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Awesome, let me know if you need me to test a build.

Jilles

On Monday, May 13, 2013 5:32:13 PM UTC+2, Florian Schilling wrote:

Hi Jilles,

I can reproduce the error and working on a branch to fix the Dateline bug.

--Florian

On Monday, May 13, 2013 10:57:03 AM UTC+2, Jilles van Gurp wrote:

I've been indexing some complex polygons using the geo_shape type from
e.g.
http://code.flickr.net/2011/01/08/flickr-shapefiles-public-dataset-2-0/and
https://github.com/johan/world.geo.json. Both files are in geojson
format and contain polygons and multipolygons.

These free data sets contain some shapes that Elasticsearch doesn't
handle.

Particularly, I'm getting these errors on some shapes:

MapperParsingException[failed to parse [geometry]]; nested:
TopologyException[found non-noded intersection between LINESTRING (
179.9435577392578 -78.29638671875, 179.9999847412109 -78.30168151855469 )
and LINESTRING ( 180.0 -78.30162811279297, -178.9799499511719
-78.39634704589844 ) [ (179.99941723147876, -78.30
16282665598, NaN) ]];

and

MapperParsingException[failed to parse [geometry]]; nested:
InvalidShapeException[Self-intersection at or near point
(119.51048182278488, 23.36987643037978, NaN)];

I'd love to have a fix for these issues because it means I can't reliably
index my datasets currently. It's not acceptable to have countries missing
from the country dataset for example. I have a workaround, which involves
calculating a simple convex polygon from the coordinates (
http://en.wikipedia.org/wiki/Convex_and_concave_polygons) but that
inevitably includes large areas outside the original polygon and it is kind
of tedious to do this using the bulk API. I was wondering if there's a
better way to do this and whether it would be possible to integrate a
solution into Lucene directly.

Jilles

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Hey Florian,

What is the status regarding this? I'm still working around this issue on
0.91 and wasn't able to find an issue for this in the issue tracker. Should
I create one?

Jilles

On Monday, May 13, 2013 5:32:13 PM UTC+2, Florian Schilling wrote:

Hi Jilles,

I can reproduce the error and working on a branch to fix the Dateline bug.

--Florian

On Monday, May 13, 2013 10:57:03 AM UTC+2, Jilles van Gurp wrote:

I've been indexing some complex polygons using the geo_shape type from
e.g.
http://code.flickr.net/2011/01/08/flickr-shapefiles-public-dataset-2-0/and
https://github.com/johan/world.geo.json. Both files are in geojson
format and contain polygons and multipolygons.

These free data sets contain some shapes that Elasticsearch doesn't
handle.

Particularly, I'm getting these errors on some shapes:

MapperParsingException[failed to parse [geometry]]; nested:
TopologyException[found non-noded intersection between LINESTRING (
179.9435577392578 -78.29638671875, 179.9999847412109 -78.30168151855469 )
and LINESTRING ( 180.0 -78.30162811279297, -178.9799499511719
-78.39634704589844 ) [ (179.99941723147876, -78.30
16282665598, NaN) ]];

and

MapperParsingException[failed to parse [geometry]]; nested:
InvalidShapeException[Self-intersection at or near point
(119.51048182278488, 23.36987643037978, NaN)];

I'd love to have a fix for these issues because it means I can't reliably
index my datasets currently. It's not acceptable to have countries missing
from the country dataset for example. I have a workaround, which involves
calculating a simple convex polygon from the coordinates (
http://en.wikipedia.org/wiki/Convex_and_concave_polygons) but that
inevitably includes large areas outside the original polygon and it is kind
of tedious to do this using the bulk API. I was wondering if there's a
better way to do this and whether it would be possible to integrate a
solution into Lucene directly.

Jilles

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Hey Jilles,

nice to hear from you. The branch fixing the dateline bug is part of a big
geo-refactoring which is pulled into 1.0 beta. Sadly I haven't tested it
against the datasets you posted, but I'll catch this up to day and keep you
informed.

cheers,
florian

On Thursday, July 11, 2013 5:18:40 PM UTC+2, Jilles van Gurp wrote:

Hey Florian,

What is the status regarding this? I'm still working around this issue on
0.91 and wasn't able to find an issue for this in the issue tracker. Should
I create one?

Jilles

On Monday, May 13, 2013 5:32:13 PM UTC+2, Florian Schilling wrote:

Hi Jilles,

I can reproduce the error and working on a branch to fix the Dateline bug.

--Florian

On Monday, May 13, 2013 10:57:03 AM UTC+2, Jilles van Gurp wrote:

I've been indexing some complex polygons using the geo_shape type from
e.g.
http://code.flickr.net/2011/01/08/flickr-shapefiles-public-dataset-2-0/and
https://github.com/johan/world.geo.json. Both files are in geojson
format and contain polygons and multipolygons.

These free data sets contain some shapes that Elasticsearch doesn't
handle.

Particularly, I'm getting these errors on some shapes:

MapperParsingException[failed to parse [geometry]]; nested:
TopologyException[found non-noded intersection between LINESTRING (
179.9435577392578 -78.29638671875, 179.9999847412109 -78.30168151855469 )
and LINESTRING ( 180.0 -78.30162811279297, -178.9799499511719
-78.39634704589844 ) [ (179.99941723147876, -78.30
16282665598, NaN) ]];

and

MapperParsingException[failed to parse [geometry]]; nested:
InvalidShapeException[Self-intersection at or near point
(119.51048182278488, 23.36987643037978, NaN)];

I'd love to have a fix for these issues because it means I can't
reliably index my datasets currently. It's not acceptable to have countries
missing from the country dataset for example. I have a workaround, which
involves calculating a simple convex polygon from the coordinates (
http://en.wikipedia.org/wiki/Convex_and_concave_polygons) but that
inevitably includes large areas outside the original polygon and it is kind
of tedious to do this using the bulk API. I was wondering if there's a
better way to do this and whether it would be possible to integrate a
solution into Lucene directly.

Jilles

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

If you want some nice polygons, take a look at http://quattroshapes.com/

That's a dataset curated by Foursquare with 300000 polygons for the whole
world (countries, cities, states, neighborhoods, etc).

It comes in shape files so you'll need gadl and ogr2ogr to convert to
geojson. I've been having some issues with that data since the polygons are
quite large (e.g. the New Zealand one is a whopping 131 MB in geojson
format. That's a single multipolygon) and there are all sorts of other
issues that make it quite hard to index without errors.

Jilles

On Friday, July 12, 2013 12:27:59 PM UTC+2, Florian Schilling wrote:

Hey Jilles,

nice to hear from you. The branch fixing the dateline bug is part of a big
geo-refactoring which is pulled into 1.0 beta. Sadly I haven't tested it
against the datasets you posted, but I'll catch this up to day and keep you
informed.

cheers,
florian

On Thursday, July 11, 2013 5:18:40 PM UTC+2, Jilles van Gurp wrote:

Hey Florian,

What is the status regarding this? I'm still working around this issue on
0.91 and wasn't able to find an issue for this in the issue tracker. Should
I create one?

Jilles

On Monday, May 13, 2013 5:32:13 PM UTC+2, Florian Schilling wrote:

Hi Jilles,

I can reproduce the error and working on a branch to fix the Dateline
bug.

--Florian

On Monday, May 13, 2013 10:57:03 AM UTC+2, Jilles van Gurp wrote:

I've been indexing some complex polygons using the geo_shape type from
e.g.
http://code.flickr.net/2011/01/08/flickr-shapefiles-public-dataset-2-0/and
https://github.com/johan/world.geo.json. Both files are in geojson
format and contain polygons and multipolygons.

These free data sets contain some shapes that Elasticsearch doesn't
handle.

Particularly, I'm getting these errors on some shapes:

MapperParsingException[failed to parse [geometry]]; nested:
TopologyException[found non-noded intersection between LINESTRING (
179.9435577392578 -78.29638671875, 179.9999847412109 -78.30168151855469 )
and LINESTRING ( 180.0 -78.30162811279297, -178.9799499511719
-78.39634704589844 ) [ (179.99941723147876, -78.30
16282665598, NaN) ]];

and

MapperParsingException[failed to parse [geometry]]; nested:
InvalidShapeException[Self-intersection at or near point
(119.51048182278488, 23.36987643037978, NaN)];

I'd love to have a fix for these issues because it means I can't
reliably index my datasets currently. It's not acceptable to have countries
missing from the country dataset for example. I have a workaround, which
involves calculating a simple convex polygon from the coordinates (
http://en.wikipedia.org/wiki/Convex_and_concave_polygons) but that
inevitably includes large areas outside the original polygon and it is kind
of tedious to do this using the bulk API. I was wondering if there's a
better way to do this and whether it would be possible to integrate a
solution into Lucene directly.

Jilles

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Hi Jilles,

This is great! Currently I try to improve the polygon filter and those
kinds of shapes will help a lot.

Thanks,
florian

On Friday, July 12, 2013 2:11:27 PM UTC+2, Jilles van Gurp wrote:

If you want some nice polygons, take a look at http://quattroshapes.com/

That's a dataset curated by Foursquare with 300000 polygons for the whole
world (countries, cities, states, neighborhoods, etc).

It comes in shape files so you'll need gadl and ogr2ogr to convert to
geojson. I've been having some issues with that data since the polygons are
quite large (e.g. the New Zealand one is a whopping 131 MB in geojson
format. That's a single multipolygon) and there are all sorts of other
issues that make it quite hard to index without errors.

Jilles

On Friday, July 12, 2013 12:27:59 PM UTC+2, Florian Schilling wrote:

Hey Jilles,

nice to hear from you. The branch fixing the dateline bug is part of a
big geo-refactoring which is pulled into 1.0 beta. Sadly I haven't tested
it against the datasets you posted, but I'll catch this up to day and keep
you informed.

cheers,
florian

On Thursday, July 11, 2013 5:18:40 PM UTC+2, Jilles van Gurp wrote:

Hey Florian,

What is the status regarding this? I'm still working around this issue
on 0.91 and wasn't able to find an issue for this in the issue tracker.
Should I create one?

Jilles

On Monday, May 13, 2013 5:32:13 PM UTC+2, Florian Schilling wrote:

Hi Jilles,

I can reproduce the error and working on a branch to fix the Dateline
bug.

--Florian

On Monday, May 13, 2013 10:57:03 AM UTC+2, Jilles van Gurp wrote:

I've been indexing some complex polygons using the geo_shape type from
e.g.
http://code.flickr.net/2011/01/08/flickr-shapefiles-public-dataset-2-0/and
https://github.com/johan/world.geo.json. Both files are in geojson
format and contain polygons and multipolygons.

These free data sets contain some shapes that Elasticsearch doesn't
handle.

Particularly, I'm getting these errors on some shapes:

MapperParsingException[failed to parse [geometry]]; nested:
TopologyException[found non-noded intersection between LINESTRING (
179.9435577392578 -78.29638671875, 179.9999847412109 -78.30168151855469 )
and LINESTRING ( 180.0 -78.30162811279297, -178.9799499511719
-78.39634704589844 ) [ (179.99941723147876, -78.30
16282665598, NaN) ]];

and

MapperParsingException[failed to parse [geometry]]; nested:
InvalidShapeException[Self-intersection at or near point
(119.51048182278488, 23.36987643037978, NaN)];

I'd love to have a fix for these issues because it means I can't
reliably index my datasets currently. It's not acceptable to have countries
missing from the country dataset for example. I have a workaround, which
involves calculating a simple convex polygon from the coordinates (
http://en.wikipedia.org/wiki/Convex_and_concave_polygons) but that
inevitably includes large areas outside the original polygon and it is kind
of tedious to do this using the bulk API. I was wondering if there's a
better way to do this and whether it would be possible to integrate a
solution into Lucene directly.

Jilles

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

There is a concept of Geometry "validity", and Elasticsearch is configured
to disallow invalid polygons. An invalid polygon may be technically
ambiguous as to what you're trying to represent. Examples of what makes an
invalid geometry can be found in
http://www.gdmc.nl/ledoux/pdfs/_12agile.pdf , page 4.

I do find it necessary to manually repair an invalid geometry before
indexing into Elasticsearch; while annoying, I also think it's acceptable
because it's not the datastore's job to fix your documents when they don't
comply with the spec. There are slightly hackish methods to repair
invalid geometries described in that paper; there is also
http://www.gdmc.nl/ledoux/pdfs/_12agile.pdf. None of these methods are
guaranteed perfect since the invalid ones are ambiguous in the first place.

Florian, I have a question about the "big geo-refactoring in 1.0 beta". Is
this separate from the big geo-refactoring (I think it was a backport from
Lucene) that was introduced in 0.90 beta? If so, what is included in the
1.0 beta refactor?

Thanks!
Jeff

On Friday, July 12, 2013 3:27:59 AM UTC-7, Florian Schilling wrote:

Hey Jilles,

nice to hear from you. The branch fixing the dateline bug is part of a big
geo-refactoring which is pulled into 1.0 beta. Sadly I haven't tested it
against the datasets you posted, but I'll catch this up to day and keep you
informed.

cheers,
florian

On Thursday, July 11, 2013 5:18:40 PM UTC+2, Jilles van Gurp wrote:

Hey Florian,

What is the status regarding this? I'm still working around this issue on
0.91 and wasn't able to find an issue for this in the issue tracker. Should
I create one?

Jilles

On Monday, May 13, 2013 5:32:13 PM UTC+2, Florian Schilling wrote:

Hi Jilles,

I can reproduce the error and working on a branch to fix the Dateline
bug.

--Florian

On Monday, May 13, 2013 10:57:03 AM UTC+2, Jilles van Gurp wrote:

I've been indexing some complex polygons using the geo_shape type from
e.g.
http://code.flickr.net/2011/01/08/flickr-shapefiles-public-dataset-2-0/and
https://github.com/johan/world.geo.json. Both files are in geojson
format and contain polygons and multipolygons.

These free data sets contain some shapes that Elasticsearch doesn't
handle.

Particularly, I'm getting these errors on some shapes:

MapperParsingException[failed to parse [geometry]]; nested:
TopologyException[found non-noded intersection between LINESTRING (
179.9435577392578 -78.29638671875, 179.9999847412109 -78.30168151855469 )
and LINESTRING ( 180.0 -78.30162811279297, -178.9799499511719
-78.39634704589844 ) [ (179.99941723147876, -78.30
16282665598, NaN) ]];

and

MapperParsingException[failed to parse [geometry]]; nested:
InvalidShapeException[Self-intersection at or near point
(119.51048182278488, 23.36987643037978, NaN)];

I'd love to have a fix for these issues because it means I can't
reliably index my datasets currently. It's not acceptable to have countries
missing from the country dataset for example. I have a workaround, which
involves calculating a simple convex polygon from the coordinates (
http://en.wikipedia.org/wiki/Convex_and_concave_polygons) but that
inevitably includes large areas outside the original polygon and it is kind
of tedious to do this using the bulk API. I was wondering if there's a
better way to do this and whether it would be possible to integrate a
solution into Lucene directly.

Jilles

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Btw. I have implemented the Douglas Peucker algorithm a few days ago to
simplify polygons and lines. That might be of use to cut down complexity at
indexing time as well. Given a specific precision in meters for the index,
it is pointless to work with shapes with a higher precision than
that: https://github.com/jillesvangurp/geogeometry/blob/master/src/main/java/com/jillesvangurp/geo/GeoGeometry.java

Look for the simplify* methods at the bottom. It might be that spatial4j
implements this as well, I haven't checked.

Another useful bit to know is that ogr2ogr has a simplify option that does
something similar and also gets rid of self intersections. So anyone having
the same issues I had with json generated from shape files: this is a part
of the solution.

Jilles

On Friday, July 12, 2013 5:18:12 PM UTC+2, Florian Schilling wrote:

Hi Jilles,

This is great! Currently I try to improve the polygon filter and those
kinds of shapes will help a lot.

Thanks,
florian

On Friday, July 12, 2013 2:11:27 PM UTC+2, Jilles van Gurp wrote:

If you want some nice polygons, take a look at http://quattroshapes.com/

That's a dataset curated by Foursquare with 300000 polygons for the whole
world (countries, cities, states, neighborhoods, etc).

It comes in shape files so you'll need gadl and ogr2ogr to convert to
geojson. I've been having some issues with that data since the polygons are
quite large (e.g. the New Zealand one is a whopping 131 MB in geojson
format. That's a single multipolygon) and there are all sorts of other
issues that make it quite hard to index without errors.

Jilles

On Friday, July 12, 2013 12:27:59 PM UTC+2, Florian Schilling wrote:

Hey Jilles,

nice to hear from you. The branch fixing the dateline bug is part of a
big geo-refactoring which is pulled into 1.0 beta. Sadly I haven't tested
it against the datasets you posted, but I'll catch this up to day and keep
you informed.

cheers,
florian

On Thursday, July 11, 2013 5:18:40 PM UTC+2, Jilles van Gurp wrote:

Hey Florian,

What is the status regarding this? I'm still working around this issue
on 0.91 and wasn't able to find an issue for this in the issue tracker.
Should I create one?

Jilles

On Monday, May 13, 2013 5:32:13 PM UTC+2, Florian Schilling wrote:

Hi Jilles,

I can reproduce the error and working on a branch to fix the Dateline
bug.

--Florian

On Monday, May 13, 2013 10:57:03 AM UTC+2, Jilles van Gurp wrote:

I've been indexing some complex polygons using the geo_shape type
from e.g.
http://code.flickr.net/2011/01/08/flickr-shapefiles-public-dataset-2-0/and
https://github.com/johan/world.geo.json. Both files are in geojson
format and contain polygons and multipolygons.

These free data sets contain some shapes that Elasticsearch doesn't
handle.

Particularly, I'm getting these errors on some shapes:

MapperParsingException[failed to parse [geometry]]; nested:
TopologyException[found non-noded intersection between LINESTRING (
179.9435577392578 -78.29638671875, 179.9999847412109 -78.30168151855469 )
and LINESTRING ( 180.0 -78.30162811279297, -178.9799499511719
-78.39634704589844 ) [ (179.99941723147876, -78.30
16282665598, NaN) ]];

and

MapperParsingException[failed to parse [geometry]]; nested:
InvalidShapeException[Self-intersection at or near point
(119.51048182278488, 23.36987643037978, NaN)];

I'd love to have a fix for these issues because it means I can't
reliably index my datasets currently. It's not acceptable to have countries
missing from the country dataset for example. I have a workaround, which
involves calculating a simple convex polygon from the coordinates (
http://en.wikipedia.org/wiki/Convex_and_concave_polygons) but that
inevitably includes large areas outside the original polygon and it is kind
of tedious to do this using the bulk API. I was wondering if there's a
better way to do this and whether it would be possible to integrate a
solution into Lucene directly.

Jilles

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Thanks for the pointers. I've been looking into ways to fix these issues
since the real world is unfortunately full of data with these issues. The
examples in the paper look exactly like the stuff I'm seeing in the data.

Self intersecting or complex polygons are quite common. Tools like Google
maps based polygon editors allow you to draw them, which means they are
going to be quite common in geojson feeds and other user generated data
with polygons. It's not so much that they are invalid but merely that they
complicate the use of certain algorithms. In the case of elastic search,
the main goal is calculating the geohash or quad tree prefixes that cover
the shape as best as possible. A best effort for a polygon with issues is
better than an error. So if there are ways to deal with some of these
issues, it is very much worth it to make elastic search more robust. I
don't think perfection is the goal here necessarily.

Jilles

On Friday, July 12, 2013 8:30:47 PM UTC+2, Jeffrey Gerard wrote:

There is a concept of Geometry "validity", and Elasticsearch is configured
to disallow invalid polygons. An invalid polygon may be technically
ambiguous as to what you're trying to represent. Examples of what makes an
invalid geometry can be found in
http://www.gdmc.nl/ledoux/pdfs/_12agile.pdf , page 4.

I do find it necessary to manually repair an invalid geometry before
indexing into Elasticsearch; while annoying, I also think it's acceptable
because it's not the datastore's job to fix your documents when they don't
comply with the spec. There are slightly hackish methods to repair
invalid geometries described in that paper; there is also
http://www.gdmc.nl/ledoux/pdfs/_12agile.pdf. None of these methods are
guaranteed perfect since the invalid ones are ambiguous in the first place.

Florian, I have a question about the "big geo-refactoring in 1.0 beta".
Is this separate from the big geo-refactoring (I think it was a backport
from Lucene) that was introduced in 0.90 beta? If so, what is included in
the 1.0 beta refactor?

Thanks!
Jeff

On Friday, July 12, 2013 3:27:59 AM UTC-7, Florian Schilling wrote:

Hey Jilles,

nice to hear from you. The branch fixing the dateline bug is part of a
big geo-refactoring which is pulled into 1.0 beta. Sadly I haven't tested
it against the datasets you posted, but I'll catch this up to day and keep
you informed.

cheers,
florian

On Thursday, July 11, 2013 5:18:40 PM UTC+2, Jilles van Gurp wrote:

Hey Florian,

What is the status regarding this? I'm still working around this issue
on 0.91 and wasn't able to find an issue for this in the issue tracker.
Should I create one?

Jilles

On Monday, May 13, 2013 5:32:13 PM UTC+2, Florian Schilling wrote:

Hi Jilles,

I can reproduce the error and working on a branch to fix the Dateline
bug.

--Florian

On Monday, May 13, 2013 10:57:03 AM UTC+2, Jilles van Gurp wrote:

I've been indexing some complex polygons using the geo_shape type from
e.g.
http://code.flickr.net/2011/01/08/flickr-shapefiles-public-dataset-2-0/and
https://github.com/johan/world.geo.json. Both files are in geojson
format and contain polygons and multipolygons.

These free data sets contain some shapes that Elasticsearch doesn't
handle.

Particularly, I'm getting these errors on some shapes:

MapperParsingException[failed to parse [geometry]]; nested:
TopologyException[found non-noded intersection between LINESTRING (
179.9435577392578 -78.29638671875, 179.9999847412109 -78.30168151855469 )
and LINESTRING ( 180.0 -78.30162811279297, -178.9799499511719
-78.39634704589844 ) [ (179.99941723147876, -78.30
16282665598, NaN) ]];

and

MapperParsingException[failed to parse [geometry]]; nested:
InvalidShapeException[Self-intersection at or near point
(119.51048182278488, 23.36987643037978, NaN)];

I'd love to have a fix for these issues because it means I can't
reliably index my datasets currently. It's not acceptable to have countries
missing from the country dataset for example. I have a workaround, which
involves calculating a simple convex polygon from the coordinates (
http://en.wikipedia.org/wiki/Convex_and_concave_polygons) but that
inevitably includes large areas outside the original polygon and it is kind
of tedious to do this using the bulk API. I was wondering if there's a
better way to do this and whether it would be possible to integrate a
solution into Lucene directly.

Jilles

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Hi Jeffrey,

most work on the geo stuff in 1.0 is behind the scenes and don't offer new
functionality. In other words, we tried to avoid spatial4j shape
serialization. But with this refactoring we also setup a dateline-wrapper
which allows to draw most shapes over the +180 and -180 longitude and
reassemble these shapes and we support a new set shapes. According to your
post, I agree with you. ES should not "correct" any shapes but it should
offer an userfriendly way to index these and also provide reasonable error
messages in cases the geometry is not valid.

--Florian

On Friday, July 12, 2013 8:30:47 PM UTC+2, Jeffrey Gerard wrote:

There is a concept of Geometry "validity", and Elasticsearch is configured
to disallow invalid polygons. An invalid polygon may be technically
ambiguous as to what you're trying to represent. Examples of what makes an
invalid geometry can be found in
http://www.gdmc.nl/ledoux/pdfs/_12agile.pdf , page 4.

I do find it necessary to manually repair an invalid geometry before
indexing into Elasticsearch; while annoying, I also think it's acceptable
because it's not the datastore's job to fix your documents when they don't
comply with the spec. There are slightly hackish methods to repair
invalid geometries described in that paper; there is also
http://www.gdmc.nl/ledoux/pdfs/_12agile.pdf. None of these methods are
guaranteed perfect since the invalid ones are ambiguous in the first place.

Florian, I have a question about the "big geo-refactoring in 1.0 beta".
Is this separate from the big geo-refactoring (I think it was a backport
from Lucene) that was introduced in 0.90 beta? If so, what is included in
the 1.0 beta refactor?

Thanks!
Jeff

On Friday, July 12, 2013 3:27:59 AM UTC-7, Florian Schilling wrote:

Hey Jilles,

nice to hear from you. The branch fixing the dateline bug is part of a
big geo-refactoring which is pulled into 1.0 beta. Sadly I haven't tested
it against the datasets you posted, but I'll catch this up to day and keep
you informed.

cheers,
florian

On Thursday, July 11, 2013 5:18:40 PM UTC+2, Jilles van Gurp wrote:

Hey Florian,

What is the status regarding this? I'm still working around this issue
on 0.91 and wasn't able to find an issue for this in the issue tracker.
Should I create one?

Jilles

On Monday, May 13, 2013 5:32:13 PM UTC+2, Florian Schilling wrote:

Hi Jilles,

I can reproduce the error and working on a branch to fix the Dateline
bug.

--Florian

On Monday, May 13, 2013 10:57:03 AM UTC+2, Jilles van Gurp wrote:

I've been indexing some complex polygons using the geo_shape type from
e.g.
http://code.flickr.net/2011/01/08/flickr-shapefiles-public-dataset-2-0/and
https://github.com/johan/world.geo.json. Both files are in geojson
format and contain polygons and multipolygons.

These free data sets contain some shapes that Elasticsearch doesn't
handle.

Particularly, I'm getting these errors on some shapes:

MapperParsingException[failed to parse [geometry]]; nested:
TopologyException[found non-noded intersection between LINESTRING (
179.9435577392578 -78.29638671875, 179.9999847412109 -78.30168151855469 )
and LINESTRING ( 180.0 -78.30162811279297, -178.9799499511719
-78.39634704589844 ) [ (179.99941723147876, -78.30
16282665598, NaN) ]];

and

MapperParsingException[failed to parse [geometry]]; nested:
InvalidShapeException[Self-intersection at or near point
(119.51048182278488, 23.36987643037978, NaN)];

I'd love to have a fix for these issues because it means I can't
reliably index my datasets currently. It's not acceptable to have countries
missing from the country dataset for example. I have a workaround, which
involves calculating a simple convex polygon from the coordinates (
http://en.wikipedia.org/wiki/Convex_and_concave_polygons) but that
inevitably includes large areas outside the original polygon and it is kind
of tedious to do this using the bulk API. I was wondering if there's a
better way to do this and whether it would be possible to integrate a
solution into Lucene directly.

Jilles

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

It seems one of my replies got deleted in Google groups? Here's the
original.

Thanks for the pointers. I've been looking into ways to fix these issues
since the real world is unfortunately full of data with these issues. The
examples in the paper look exactly like the stuff I'm seeing in the data.

Self intersecting or complex polygons are quite common. Tools like Google
maps based polygon editors allow you to draw them, which means they are
going to be quite common in geojson feeds and other user generated data
with polygons. It's not so much that they are invalid but merely that they
complicate the use of certain algorithms. In the case of elastic search,
the main goal is calculating the geohash or quad tree prefixes that cover
the shape as best as possible. A best effort for a polygon with issues is
better than an error. So if there are ways to deal with some of these
issues, it is very much worth it to make elastic search more robust. I
don't think perfection is the goal here necessarily.

On Friday, July 12, 2013 8:30:47 PM UTC+2, Jeffrey Gerard wrote:

There is a concept of Geometry "validity", and Elasticsearch is configured
to disallow invalid polygons. An invalid polygon may be technically
ambiguous as to what you're trying to represent. Examples of what makes an
invalid geometry can be found in
http://www.gdmc.nl/ledoux/pdfs/_12agile.pdf , page 4.

I do find it necessary to manually repair an invalid geometry before
indexing into Elasticsearch; while annoying, I also think it's acceptable
because it's not the datastore's job to fix your documents when they don't
comply with the spec. There are slightly hackish methods to repair
invalid geometries described in that paper; there is also
http://www.gdmc.nl/ledoux/pdfs/_12agile.pdf. None of these methods are
guaranteed perfect since the invalid ones are ambiguous in the first place.

Florian, I have a question about the "big geo-refactoring in 1.0 beta".
Is this separate from the big geo-refactoring (I think it was a backport
from Lucene) that was introduced in 0.90 beta? If so, what is included in
the 1.0 beta refactor?

Thanks!
Jeff

On Friday, July 12, 2013 3:27:59 AM UTC-7, Florian Schilling wrote:

Hey Jilles,

nice to hear from you. The branch fixing the dateline bug is part of a
big geo-refactoring which is pulled into 1.0 beta. Sadly I haven't tested
it against the datasets you posted, but I'll catch this up to day and keep
you informed.

cheers,
florian

On Thursday, July 11, 2013 5:18:40 PM UTC+2, Jilles van Gurp wrote:

Hey Florian,

What is the status regarding this? I'm still working around this issue
on 0.91 and wasn't able to find an issue for this in the issue tracker.
Should I create one?

Jilles

On Monday, May 13, 2013 5:32:13 PM UTC+2, Florian Schilling wrote:

Hi Jilles,

I can reproduce the error and working on a branch to fix the Dateline
bug.

--Florian

On Monday, May 13, 2013 10:57:03 AM UTC+2, Jilles van Gurp wrote:

I've been indexing some complex polygons using the geo_shape type from
e.g.
http://code.flickr.net/2011/01/08/flickr-shapefiles-public-dataset-2-0/and
https://github.com/johan/world.geo.json. Both files are in geojson
format and contain polygons and multipolygons.

These free data sets contain some shapes that Elasticsearch doesn't
handle.

Particularly, I'm getting these errors on some shapes:

MapperParsingException[failed to parse [geometry]]; nested:
TopologyException[found non-noded intersection between LINESTRING (
179.9435577392578 -78.29638671875, 179.9999847412109 -78.30168151855469 )
and LINESTRING ( 180.0 -78.30162811279297, -178.9799499511719
-78.39634704589844 ) [ (179.99941723147876, -78.30
16282665598, NaN) ]];

and

MapperParsingException[failed to parse [geometry]]; nested:
InvalidShapeException[Self-intersection at or near point
(119.51048182278488, 23.36987643037978, NaN)];

I'd love to have a fix for these issues because it means I can't
reliably index my datasets currently. It's not acceptable to have countries
missing from the country dataset for example. I have a workaround, which
involves calculating a simple convex polygon from the coordinates (
http://en.wikipedia.org/wiki/Convex_and_concave_polygons) but that
inevitably includes large areas outside the original polygon and it is kind
of tedious to do this using the bulk API. I was wondering if there's a
better way to do this and whether it would be possible to integrate a
solution into Lucene directly.

Jilles

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.