ElasticSearch geo search vs Solr geo search performance?


(Maxim Veksler) #1

Hello to everyone and Shay,

As a new comer to the full text search world I see that there are 2 main
implementations for geo proximity search on top of Lucene: Solr and ES.

Could you please shed some light on the 2 implementations?

Are they both using the same indexing?
Did anyone did any benchmarks between the 2? (or perhaps this question
is irrelevant if the answer to the first question is yes?)
Are there any known constraints / query types that one supports and the
other does not?

I'm just learning this field, so I understand that my questions are a bit
broad.

p.s. Lucene itself is getting hot on geohash based search, see
http://code.google.com/p/lucene-spatial-playground/

Thank you,
Maxim.


(Shay Banon) #2

I don't know how Solr implements geo, but I can explain a bit how
elasticsearch does that. By default, it uses in memory check on the lat/lon
to see if they conform to the relevant geo filter. For example, for a
distance filter, it will first build a bounding box and check if it falls
within that bounding box, and if so, do the more expensive distance calc
(you have two options there) and check if it falls within the distance.

If you also index the actual lat lon, you can change the optimize_bbox
option to work in "indexed" mode, in which case it will use the indexed
lat/lon values to do index based checks on the bounding box (not in memory
ones).

The same logic applies to other geo constructs.

On Mon, Jan 9, 2012 at 2:59 PM, Maxim Veksler maxim@vekslers.org wrote:

Hello to everyone and Shay,

As a new comer to the full text search world I see that there are 2 main
implementations for geo proximity search on top of Lucene: Solr and ES.

Could you please shed some light on the 2 implementations?

Are they both using the same indexing?
Did anyone did any benchmarks between the 2? (or perhaps this question
is irrelevant if the answer to the first question is yes?)
Are there any known constraints / query types that one supports and the
other does not?

I'm just learning this field, so I understand that my questions are a bit
broad.

p.s. Lucene itself is getting hot on geohash based search, see
http://code.google.com/p/lucene-spatial-playground/

Thank you,
Maxim.


(Alex Piggott) #3

Quick semi-hijack since you just partially answered a related
question, apologies if that's poor thread etiquette!

We are about to implement polygon searches using elasticsearch, and
were going to calculate a bounding box including the polygon to speed
up performance - is this something you already do (and if not is it
because it's not necessary?)

I had a quick look at the source code, and it looked like (a) you
didn't do it (b) the improvement would probably be marginal at best
with InMemoryGeoBoundingBoxFilter, but significant with
IndexedGeoBoundingBoxFilter? Our queries will sometimes be most
heavily (or solely) filtered by geo (ie that will be the query
providing the main data reduction); does this mean it is likely to be
worth it for us?

To the OP, I can't give a quantified answer because it was too long
ago, but we moved from SOLR to elasticsearch in Spring 2011 and the
major reason was the difference in geo functionality (+our very quick
and unscientific benchmarks back then suggested SOLR wasn't "usefully"
faster, if at all), no idea if SOLR has closed the gap since then
though.

On Jan 9, 2:43 pm, Shay Banon kim...@gmail.com wrote:

I don't know how Solr implements geo, but I can explain a bit how
elasticsearch does that. By default, it uses in memory check on the lat/lon
to see if they conform to the relevant geo filter. For example, for a
distance filter, it will first build a bounding box and check if it falls
within that bounding box, and if so, do the more expensive distance calc
(you have two options there) and check if it falls within the distance.

If you also index the actual lat lon, you can change the optimize_bbox
option to work in "indexed" mode, in which case it will use the indexed
lat/lon values to do index based checks on the bounding box (not in memory
ones).

The same logic applies to other geo constructs.

On Mon, Jan 9, 2012 at 2:59 PM, Maxim Veksler ma...@vekslers.org wrote:

Hello to everyone and Shay,

As a new comer to the full text search world I see that there are 2 main
implementations for geo proximity search on top of Lucene: Solr and ES.

Could you please shed some light on the 2 implementations?

Are they both using the same indexing?
Did anyone did any benchmarks between the 2? (or perhaps this question
is irrelevant if the answer to the first question is yes?)
Are there any known constraints / query types that one supports and the
other does not?

I'm just learning this field, so I understand that my questions are a bit
broad.

p.s. Lucene itself is getting hot on geohash based search, see
http://code.google.com/p/lucene-spatial-playground/

Thank you,
Maxim.


(Shay Banon) #4

Calculating a bounding box will help for polygon search, its not done
automatically. Even for the in memory case, cause the bounding box check is
faster than the polygon check.

On Mon, Jan 9, 2012 at 11:59 PM, Alex at Ikanow apiggott@ikanow.com wrote:

Quick semi-hijack since you just partially answered a related
question, apologies if that's poor thread etiquette!

We are about to implement polygon searches using elasticsearch, and
were going to calculate a bounding box including the polygon to speed
up performance - is this something you already do (and if not is it
because it's not necessary?)

I had a quick look at the source code, and it looked like (a) you
didn't do it (b) the improvement would probably be marginal at best
with InMemoryGeoBoundingBoxFilter, but significant with
IndexedGeoBoundingBoxFilter? Our queries will sometimes be most
heavily (or solely) filtered by geo (ie that will be the query
providing the main data reduction); does this mean it is likely to be
worth it for us?

To the OP, I can't give a quantified answer because it was too long
ago, but we moved from SOLR to elasticsearch in Spring 2011 and the
major reason was the difference in geo functionality (+our very quick
and unscientific benchmarks back then suggested SOLR wasn't "usefully"
faster, if at all), no idea if SOLR has closed the gap since then
though.

On Jan 9, 2:43 pm, Shay Banon kim...@gmail.com wrote:

I don't know how Solr implements geo, but I can explain a bit how
elasticsearch does that. By default, it uses in memory check on the
lat/lon
to see if they conform to the relevant geo filter. For example, for a
distance filter, it will first build a bounding box and check if it falls
within that bounding box, and if so, do the more expensive distance calc
(you have two options there) and check if it falls within the distance.

If you also index the actual lat lon, you can change the optimize_bbox
option to work in "indexed" mode, in which case it will use the indexed
lat/lon values to do index based checks on the bounding box (not in
memory
ones).

The same logic applies to other geo constructs.

On Mon, Jan 9, 2012 at 2:59 PM, Maxim Veksler ma...@vekslers.org
wrote:

Hello to everyone and Shay,

As a new comer to the full text search world I see that there are 2
main

implementations for geo proximity search on top of Lucene: Solr and ES.

Could you please shed some light on the 2 implementations?

Are they both using the same indexing?
Did anyone did any benchmarks between the 2? (or perhaps this question
is irrelevant if the answer to the first question is yes?)
Are there any known constraints / query types that one supports and the
other does not?

I'm just learning this field, so I understand that my questions are a
bit

broad.

p.s. Lucene itself is getting hot on geohash based search, see
http://code.google.com/p/lucene-spatial-playground/

Thank you,
Maxim.


(system) #5