A couple of geo-related questions

Andrei · October 7, 2010, 10:24pm

How is the geo-indexing implemented: is it a btree- or quadtree-based
index, a geohash one, or something else? Curious how it compares to
Mongo's geo support.

Also, would it be possible to have a facet for returning the centroid
and the minimum bounding box for the set of results found via the geo
filters?

-Andrei

kimchy · October 8, 2010, 11:17am

On Fri, Oct 8, 2010 at 12:24 AM, Andrei andrei@zmievski.org wrote:

How is the geo-indexing implemented: is it a btree- or quadtree-based
index, a geohash one, or something else? Curious how it compares to
Mongo's geo support.

I am not sure how mongodb implemented geo support, in elasticsearch, because
of its search engine / inverted index nature, it is implemented quite
differently. When searching, you basically traverse a set of document ids
(internal ondes, not the _id) that correspond to the relevant query. What
you want is to check (filter) or collect (facets) as fast as possible data
based on that doc id. The way you usually do it is by loading the relevant
data to memory and creating a docId -> value (which is the inversion of the
inverted index).

Also, would it be possible to have a facet for returning the centroid
and the minimum bounding box for the set of results found via the geo
filters?

Should be possible, need to look / learn more into how to do it.

-Andrei

As a side note, I have discovered that having multiple locations per doc
does not work correctly. Checking how to fix that, might require a different
structure to be stored in the index.

Andrei · October 8, 2010, 6:07pm

So how does this work with a match_all query? Does it mean that it
needs to scan all the documents and check that the value of geo-
coordinate is in the bounding box?

The bounding box is actually the Minimum bounding rectangle - Wikipedia.
Worth checking on R-trees linked from there too.

-Andrei

On Oct 8, 4:17 am, Shay Banon shay.ba...@elasticsearch.com wrote:

I am not sure how mongodb implemented geo support, in elasticsearch, because
of its search engine / inverted index nature, it is implemented quite
differently. When searching, you basically traverse a set of document ids
(internal ondes, not the _id) that correspond to the relevant query. What
you want is to check (filter) or collect (facets) as fast as possible data
based on that doc id. The way you usually do it is by loading the relevant
data to memory and creating a docId -> value (which is the inversion of the
inverted index).

Also, would it be possible to have a facet for returning the centroid
and the minimum bounding box for the set of results found via the geo
filters?

Should be possible, need to look / learn more into how to do it.

-Andrei

As a side note, I have discovered that having multiple locations per doc
does not work correctly. Checking how to fix that, might require a different
structure to be stored in the index.

kimchy · October 8, 2010, 6:56pm

On Fri, Oct 8, 2010 at 8:07 PM, Andrei andrei@zmievski.org wrote:

So how does this work with a match_all query? Does it mean that it
needs to scan all the documents and check that the value of geo-
coordinate is in the bounding box?

Yes, though match_all is not really the common case. It makes little sense
to build another indexing data structure just for this case, while on all
other case, the best solution is the one I outlined. But, even with
match_all, the execution is all in memory, so the difference would be
negligible if not faster.... . Sharding helps scaling out as well.

The bounding box is actually the
Minimum bounding rectangle - Wikipedia.

ok.

Worth checking on R-trees linked from there too.

Irrelevant. R-trees are nice, but make little sense when coupled with how a
search engine works and providing geo on top of it.

-Andrei

On Oct 8, 4:17 am, Shay Banon shay.ba...@elasticsearch.com wrote:

I am not sure how mongodb implemented geo support, in elasticsearch,
because
of its search engine / inverted index nature, it is implemented quite
differently. When searching, you basically traverse a set of document ids
(internal ondes, not the _id) that correspond to the relevant query. What
you want is to check (filter) or collect (facets) as fast as possible
data
based on that doc id. The way you usually do it is by loading the
relevant
data to memory and creating a docId -> value (which is the inversion of
the
inverted index).

Also, would it be possible to have a facet for returning the centroid
and the minimum bounding box for the set of results found via the geo
filters?

Should be possible, need to look / learn more into how to do it.

-Andrei

As a side note, I have discovered that having multiple locations per doc
does not work correctly. Checking how to fix that, might require a
different
structure to be stored in the index.