It seems one of my replies got deleted in Google groups? Here's the
original.
Thanks for the pointers. I've been looking into ways to fix these issues
since the real world is unfortunately full of data with these issues. The
examples in the paper look exactly like the stuff I'm seeing in the data.
Self intersecting or complex polygons are quite common. Tools like Google
maps based polygon editors allow you to draw them, which means they are
going to be quite common in geojson feeds and other user generated data
with polygons. It's not so much that they are invalid but merely that they
complicate the use of certain algorithms. In the case of Elasticsearch,
the main goal is calculating the geohash or quad tree prefixes that cover
the shape as best as possible. A best effort for a polygon with issues is
better than an error. So if there are ways to deal with some of these
issues, it is very much worth it to make Elasticsearch more robust. I
don't think perfection is the goal here necessarily.
On Friday, July 12, 2013 8:30:47 PM UTC+2, Jeffrey Gerard wrote:
There is a concept of Geometry "validity", and Elasticsearch is configured
to disallow invalid polygons. An invalid polygon may be technically
ambiguous as to what you're trying to represent. Examples of what makes an
invalid geometry can be found in
http://www.gdmc.nl/ledoux/pdfs/_12agile.pdf , page 4.
I do find it necessary to manually repair an invalid geometry before
indexing into Elasticsearch; while annoying, I also think it's acceptable
because it's not the datastore's job to fix your documents when they don't
comply with the spec. There are slightly hackish methods to repair
invalid geometries described in that paper; there is also
http://www.gdmc.nl/ledoux/pdfs/_12agile.pdf. None of these methods are
guaranteed perfect since the invalid ones are ambiguous in the first place.
Florian, I have a question about the "big geo-refactoring in 1.0 beta".
Is this separate from the big geo-refactoring (I think it was a backport
from Lucene) that was introduced in 0.90 beta? If so, what is included in
the 1.0 beta refactor?
Thanks!
Jeff
On Friday, July 12, 2013 3:27:59 AM UTC-7, Florian Schilling wrote:
Hey Jilles,
nice to hear from you. The branch fixing the dateline bug is part of a
big geo-refactoring which is pulled into 1.0 beta. Sadly I haven't tested
it against the datasets you posted, but I'll catch this up to day and keep
you informed.
cheers,
florian
On Thursday, July 11, 2013 5:18:40 PM UTC+2, Jilles van Gurp wrote:
Hey Florian,
What is the status regarding this? I'm still working around this issue
on 0.91 and wasn't able to find an issue for this in the issue tracker.
Should I create one?
Jilles
On Monday, May 13, 2013 5:32:13 PM UTC+2, Florian Schilling wrote:
Hi Jilles,
I can reproduce the error and working on a branch to fix the Dateline
bug.
--Florian
On Monday, May 13, 2013 10:57:03 AM UTC+2, Jilles van Gurp wrote:
I've been indexing some complex polygons using the geo_shape type from
e.g.
http://code.flickr.net/2011/01/08/flickr-shapefiles-public-dataset-2-0/and
GitHub - johan/world.geo.json: Annotated geo-json geometry files for the world. Both files are in geojson
format and contain polygons and multipolygons.
These free data sets contain some shapes that Elasticsearch doesn't
handle.
Particularly, I'm getting these errors on some shapes:
MapperParsingException[failed to parse [geometry]]; nested:
TopologyException[found non-noded intersection between LINESTRING (
179.9435577392578 -78.29638671875, 179.9999847412109 -78.30168151855469 )
and LINESTRING ( 180.0 -78.30162811279297, -178.9799499511719
-78.39634704589844 ) [ (179.99941723147876, -78.30
16282665598, NaN) ]];
and
MapperParsingException[failed to parse [geometry]]; nested:
InvalidShapeException[Self-intersection at or near point
(119.51048182278488, 23.36987643037978, NaN)];
I'd love to have a fix for these issues because it means I can't
reliably index my datasets currently. It's not acceptable to have countries
missing from the country dataset for example. I have a workaround, which
involves calculating a simple convex polygon from the coordinates (
Polygon - Wikipedia) but that
inevitably includes large areas outside the original polygon and it is kind
of tedious to do this using the bulk API. I was wondering if there's a
better way to do this and whether it would be possible to integrate a
solution into Lucene directly.
Jilles
--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.