How is the geo-indexing implemented: is it a btree- or quadtree-based
index, a geohash one, or something else? Curious how it compares to
Mongo's geo support.
Also, would it be possible to have a facet for returning the centroid
and the minimum bounding box for the set of results found via the geo
filters?
How is the geo-indexing implemented: is it a btree- or quadtree-based
index, a geohash one, or something else? Curious how it compares to
Mongo's geo support.
I am not sure how mongodb implemented geo support, in elasticsearch, because
of its search engine / inverted index nature, it is implemented quite
differently. When searching, you basically traverse a set of document ids
(internal ondes, not the _id) that correspond to the relevant query. What
you want is to check (filter) or collect (facets) as fast as possible data
based on that doc id. The way you usually do it is by loading the relevant
data to memory and creating a docId -> value (which is the inversion of the
inverted index).
Also, would it be possible to have a facet for returning the centroid
and the minimum bounding box for the set of results found via the geo
filters?
Should be possible, need to look / learn more into how to do it.
-Andrei
As a side note, I have discovered that having multiple locations per doc
does not work correctly. Checking how to fix that, might require a different
structure to be stored in the index.
So how does this work with a match_all query? Does it mean that it
needs to scan all the documents and check that the value of geo-
coordinate is in the bounding box?
I am not sure how mongodb implemented geo support, in elasticsearch, because
of its search engine / inverted index nature, it is implemented quite
differently. When searching, you basically traverse a set of document ids
(internal ondes, not the _id) that correspond to the relevant query. What
you want is to check (filter) or collect (facets) as fast as possible data
based on that doc id. The way you usually do it is by loading the relevant
data to memory and creating a docId -> value (which is the inversion of the
inverted index).
Also, would it be possible to have a facet for returning the centroid
and the minimum bounding box for the set of results found via the geo
filters?
Should be possible, need to look / learn more into how to do it.
-Andrei
As a side note, I have discovered that having multiple locations per doc
does not work correctly. Checking how to fix that, might require a different
structure to be stored in the index.
So how does this work with a match_all query? Does it mean that it
needs to scan all the documents and check that the value of geo-
coordinate is in the bounding box?
Yes, though match_all is not really the common case. It makes little sense
to build another indexing data structure just for this case, while on all
other case, the best solution is the one I outlined. But, even with
match_all, the execution is all in memory, so the difference would be
negligible if not faster.... . Sharding helps scaling out as well.
I am not sure how mongodb implemented geo support, in elasticsearch,
because
of its search engine / inverted index nature, it is implemented quite
differently. When searching, you basically traverse a set of document ids
(internal ondes, not the _id) that correspond to the relevant query. What
you want is to check (filter) or collect (facets) as fast as possible
data
based on that doc id. The way you usually do it is by loading the
relevant
data to memory and creating a docId -> value (which is the inversion of
the
inverted index).
Also, would it be possible to have a facet for returning the centroid
and the minimum bounding box for the set of results found via the geo
filters?
Should be possible, need to look / learn more into how to do it.
-Andrei
As a side note, I have discovered that having multiple locations per doc
does not work correctly. Checking how to fix that, might require a
different
structure to be stored in the index.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.