Geo_shape questions & comments

Many thanks for the implementation of the geo_shape type and filters! I
expect them to be incredibly useful for us.

I do have some questions, and some suggestions, that I hope somebody that
understands the implementation could answer:

  • (Suggestion) From reading the documentation I expected that the
    filters/queries would use the prefixtree indexes to speed things up, but
    further filter with geometry calculations on the original geometries to
    decide whether there is an actual "intersection"/"within" relation. The
    documentation should probably make it more clear that the only filtering
    that's done is on the hashes, never on the original geometries, and that
    the depth therefore not only influences the performance, but also the
    outcome of queries/filters

  • (Question) A bit more documentation about how the combination of
    distance_error_pct and tree_levels affects the precision/results of filters
    would really be appreciated. From the docs and code I'm having a hard time
    understanding the consequences of altering both values on filters and

  • (Suggestion) The algorithm for the "within" (In TermQueryPrefixStrategy)
    could be improved (imho) by removing hashes from the must-not clause (i.e.
    hashes for the buffer around the original geometry), when these hashes are
    already in the "should" clause (i.e. the intersects part of that query).
    That would provide more consistent behaviour when the filter-geometry and
    the target-geometry are equal. With the current algorithm, equal geometries
    will in most cases (but not always) not be found, because the must-not
    clause will likely contain hashes for points inside the original geometry
    (i.e. overlapping with hashes in the intersects query). With my suggestion,
    the "within" query/filter will match any geometry that is equal to or
    contained within the filter-geometry.

  • (Question) quote: "Because of current limitations of the algorithm, very
    large indexed shapes are not deemed to intersect with very small filter
    shapes". Are there any plans to fix this? (I would expect that when not
    only the hashes of a geometry, but also the maximum depth is indexed, it
    would be possible to fix the filters for that. E.g. the intersects filter
    could then be extended by also filtering with each parent-hash boolean-and
    with the lavel of that parent-hash. That would find the larger geometries
    that intersect, without also getting smaller ones outside the filter geo,
    but intersecting with the parent).

  • (Question/Suggestion): Are there plans to support "Pre-Indexed-Shapes"
    also in documents, i.e. specify a pre-indexed shape to be indexed with a
    new document, instead of adding the geometry itself to that document? I
    would expect that in many use-cases many of the same geometries will be
    indexed with multiple docs. Just like with filters/queries the performance
    could benefit quite a lot if the indexer could just copy the hashes over
    from an already indexed geometry.

  • (Question/Suggestion): Imho allowing a serialisation of a geometry as
    e.g. WKT would not only trim-down on the size of documents, but also on the
    work that elasticsearch needs to do for serializing/deserializing
    geometries. Polygons quickly become really big when expressed in JSON... Is
    this something that is considered and/or that will be accepted when
    provided in a decent pull-request?