I posted a message yesterday, but somehow it didn't get on the list. Trying
again....
I have some questions/remarks about the (incredibly useful) geo_shape type
and filters.
-
Are there plans to support "Pre-Indexed-Shapes" also in documents, i.e.
specify a pre-indexed shape to be indexed with a new document, instead of
adding the geometry itself to that document? I would expect that in many
use-cases the same geometries will be indexed with multiple docs. Just like
with filters/queries the performance could benefit quite a lot if the
indexer could just copy the hashes over from an already indexed geometry. -
Imho allowing a serialisation of a geometry as e.g. WKT would not only
trim-down on the size of documents, but also on the work that elasticsearch
needs to do for serializing/deserializing geometries. Polygons quickly
become really big when expressed in JSON... Is this something that is
considered and/or that will be accepted when provided in a decent
pull-request? -
A bit more documentation about how the combination of distance_error_pct
and tree_levels affects the precision/results of filters would really be
appreciated. From the docs and code I'm having a hard time understanding
the consequences of altering both values on filters and indexes. What
exactly does distance_error_pct, and how does it affect e.g. an
intersection filter? -
Quote from the docs: "Because of current limitations of the algorithm,
very large indexed shapes are not deemed to intersect with very small
filter shapes". Are there any plans to fix this? Assuming the algorithmic
problem is that large shapes are only hashed up to a maximum depth, there
are a couple of ways to fix this. E.g. the indexer could add an extra field
with the hashes from only the deepest hash-level it uses for that geometry.
The intersection filter could use this by extending (boolean or) the
current filter with a term-filter on that field for all parents of the
hashes it currently uses for searching. That way larger shapes that
intersect will be included, and smaller shapes that only happen to share a
parent won't be included. -
The algorith for "within" (In TermQueryPrefixStrategy) could be improved
(imho). It's currently inconsistent for geometries that are equal or just a
tiny bit smaller than the filter-geometry, and I think that could easily be
fixed. I've filed ann issue about this yesterday, so I won't get further
into it here, see https://github.com/elasticsearch/elasticsearch/issues/2552
Thanks!
--