Latitude / Longitude based search is very slow in Elastic Search


(prashant5375) #1

Hi there ,
i have been writing about this issue in the elastic search google
groups , but unfortunately no one address the issue.
So i am posting it again , i guess you guys are not able to see my
older post.
Problem :
Setup 4 core cpu with 24gb of ram .
Index size is 10gb , Using ES latest version
Now when i have no where clause in the query its gives results in no
time say 20 ms. but when i added BoundingBox filter its started giving
result in > 200 ms .
Please correct me if i am wrong , i have used the same example you
guys have mentioned in the Elastic Search (GeoBoundingBoxTests.java).
Do i have to do something else or is this the limitation of ES.
Regards
Prashant


(Shay Banon) #2

If thats the time it takes, then thats the time it takes. I have explained the other thread how to try and see if you can execute it faster using the "indexed' option, but it mainly applied when only doing geo search without any other constraint.

How many machines do you have? Is that that single machine? How many docs match the query provided without the geo bounding box filter?

On Tuesday, February 7, 2012 at 12:36 PM, BeyondLimit wrote:

Hi there ,
i have been writing about this issue in the elastic search google
groups , but unfortunately no one address the issue.
So i am posting it again , i guess you guys are not able to see my
older post.
Problem :
Setup 4 core cpu with 24gb of ram .
Index size is 10gb , Using ES latest version
Now when i have no where clause in the query its gives results in no
time say 20 ms. but when i added BoundingBox filter its started giving
result in > 200 ms .
Please correct me if i am wrong , i have used the same example you
guys have mentioned in the Elastic Search (GeoBoundingBoxTests.java).
Do i have to do something else or is this the limitation of ES.
Regards
Prashant


(prashant5375) #3

Hi shay ,
I have been using 1 machine , I m having 14 million Business name of USA ,
100-200 docs matches with the Geo bounding box .
And with out its very based on input in some cases 30k+ docs its retrieve
in 1.2 sec , in retrieving of 500 docs its takes 20 mili sec.
Actually i think you guys can do this , actully i have been using lucene
since last 4 years , i while applying NumericRange Query i was able to do
the same in say max of 80 mili sec . Query given below :

+(MAPPED_FSN:starbucks) +GEO_LAT:[212686387 TO 212744216]
+GEO_LONG:[-27185317 TO -27127488]

Without any condition Elastic Search is super fast , i am very impressed ,
so when in lucene itself we are able to get the bounding query results in
less than 80 ms , then probably in the elastic search you guys can get that
number , Now a days every one whats to create application based on Lat/Long
so this feature is very important.
Waiting for your reply , let me know if i can help you to make ES better.

Regards

On Tue, Feb 7, 2012 at 5:43 PM, Shay Banon kimchy@gmail.com wrote:

If thats the time it takes, then thats the time it takes. I have
explained the other thread how to try and see if you can execute it faster
using the "indexed' option, but it mainly applied when only doing geo
search without any other constraint.

How many machines do you have? Is that that single machine? How many docs
match the query provided without the geo bounding box filter?

On Tuesday, February 7, 2012 at 12:36 PM, BeyondLimit wrote:

Hi there ,
i have been writing about this issue in the elastic search google
groups , but unfortunately no one address the issue.
So i am posting it again , i guess you guys are not able to see my
older post.
Problem :
Setup 4 core cpu with 24gb of ram .
Index size is 10gb , Using ES latest version
Now when i have no where clause in the query its gives results in no
time say 20 ms. but when i added BoundingBox filter its started giving
result in > 200 ms .
Please correct me if i am wrong , i have used the same example you
guys have mentioned in the Elastic Search (GeoBoundingBoxTests.java).
Do i have to do something else or is this the limitation of ES.
Regards
Prashant


(Shay Banon) #4

When you issue a geo bounding box filter, it can work in two ways. The default (memory) will do a check against each doc that matches the query, if it falls within the bounding box criteria in memory, loading the lat lon values and doing simple range checks. If you use the "indexed" option, it will effectively do what you listed below, it will do a Lucene range on the lat lon, intersect it, and use it for the results.

I don't know what data set you used for your pure Lucene tests. Note, by default, an elasticsearch index is partitioned into 5 shards, so the query you execute will even execute in parallel across the relevant shards. On a single box scenario, it will mean parallel execution of 5.

Back on the other thread, you used set the cache on the geo filter, make sure you don't use it, I don't know at what state your tests are.

Obviously, there is a way to improve the performance, which is simply to add more nodes to the cluster.

On Tuesday, February 7, 2012 at 2:26 PM, PS wrote:

Hi shay ,
I have been using 1 machine , I m having 14 million Business name of USA , 100-200 docs matches with the Geo bounding box .
And with out its very based on input in some cases 30k+ docs its retrieve in 1.2 sec , in retrieving of 500 docs its takes 20 mili sec.
Actually i think you guys can do this , actully i have been using lucene since last 4 years , i while applying NumericRange Query i was able to do the same in say max of 80 mili sec . Query given below :

+(MAPPED_FSN:starbucks) +GEO_LAT:[212686387 TO 212744216] +GEO_LONG:[-27185317 TO -27127488]

Without any condition Elastic Search is super fast , i am very impressed , so when in lucene itself we are able to get the bounding query results in less than 80 ms , then probably in the elastic search you guys can get that number , Now a days every one whats to create application based on Lat/Long so this feature is very important.
Waiting for your reply , let me know if i can help you to make ES better.

Regards

On Tue, Feb 7, 2012 at 5:43 PM, Shay Banon <kimchy@gmail.com (mailto:kimchy@gmail.com)> wrote:

If thats the time it takes, then thats the time it takes. I have explained the other thread how to try and see if you can execute it faster using the "indexed' option, but it mainly applied when only doing geo search without any other constraint.

How many machines do you have? Is that that single machine? How many docs match the query provided without the geo bounding box filter?

On Tuesday, February 7, 2012 at 12:36 PM, BeyondLimit wrote:

Hi there ,
i have been writing about this issue in the elastic search google
groups , but unfortunately no one address the issue.
So i am posting it again , i guess you guys are not able to see my
older post.
Problem :
Setup 4 core cpu with 24gb of ram .
Index size is 10gb , Using ES latest version
Now when i have no where clause in the query its gives results in no
time say 20 ms. but when i added BoundingBox filter its started giving
result in > 200 ms .
Please correct me if i am wrong , i have used the same example you
guys have mentioned in the Elastic Search (GeoBoundingBoxTests.java).
Do i have to do something else or is this the limitation of ES.
Regards
Prashant


(system) #5