Many thanks for the implementation of the geo_shape type and filters! I
expect them to be incredibly useful for us.
I do have some questions, and some suggestions, that I hope somebody that
understands the implementation could answer:
-
(Suggestion) From reading the documentation I expected that the
filters/queries would use the prefixtree indexes to speed things up, but
further filter with geometry calculations on the original geometries to
decide whether there is an actual "intersection"/"within" relation. The
documentation should probably make it more clear that the only filtering
that's done is on the hashes, never on the original geometries, and that
the depth therefore not only influences the performance, but also the
outcome of queries/filters -
(Question) A bit more documentation about how the combination of
distance_error_pct and tree_levels affects the precision/results of filters
would really be appreciated. From the docs and code I'm having a hard time
understanding the consequences of altering both values on filters and
indexes. -
(Suggestion) The algorithm for the "within" (In TermQueryPrefixStrategy)
could be improved (imho) by removing hashes from the must-not clause (i.e.
hashes for the buffer around the original geometry), when these hashes are
already in the "should" clause (i.e. the intersects part of that query).
That would provide more consistent behaviour when the filter-geometry and
the target-geometry are equal. With the current algorithm, equal geometries
will in most cases (but not always) not be found, because the must-not
clause will likely contain hashes for points inside the original geometry
(i.e. overlapping with hashes in the intersects query). With my suggestion,
the "within" query/filter will match any geometry that is equal to or
contained within the filter-geometry. -
(Question) quote: "Because of current limitations of the algorithm, very
large indexed shapes are not deemed to intersect with very small filter
shapes". Are there any plans to fix this? (I would expect that when not
only the hashes of a geometry, but also the maximum depth is indexed, it
would be possible to fix the filters for that. E.g. the intersects filter
could then be extended by also filtering with each parent-hash boolean-and
with the lavel of that parent-hash. That would find the larger geometries
that intersect, without also getting smaller ones outside the filter geo,
but intersecting with the parent). -
(Question/Suggestion): Are there plans to support "Pre-Indexed-Shapes"
also in documents, i.e. specify a pre-indexed shape to be indexed with a
new document, instead of adding the geometry itself to that document? I
would expect that in many use-cases many of the same geometries will be
indexed with multiple docs. Just like with filters/queries the performance
could benefit quite a lot if the indexer could just copy the hashes over
from an already indexed geometry. -
(Question/Suggestion): Imho allowing a serialisation of a geometry as
e.g. WKT would not only trim-down on the size of documents, but also on the
work that elasticsearch needs to do for serializing/deserializing
geometries. Polygons quickly become really big when expressed in JSON... Is
this something that is considered and/or that will be accepted when
provided in a decent pull-request?
Thanks...
--