Why are bounding box geo queries so slow?


(Jason-5) #1

I have an index with around 3 million documents in which there is a single
geo_point field. When performing a geo_distance query I see response times
of around 25,000 ms. After looking more at how geo_distance needs to work
this wasn't surprising but I would have expected geo_bounding_box to be a
lot faster as it has absolute ranges to deal with (as opposed to having to
calculate distance for every document).

But actually it's no faster at all. I see roughly the same response time,
somewhere between 20 and 30 seconds. A "normal" query (i.e. one that does
not have any geo components) takes around 200-500ms on the hardware I am
using.

The reason this is confusing is that a simple range query on a numeric
field has negligible impact on query time. Isn't a bounding box query just
a set of range filters? Really just 2 ranges? (lat range and lon range).
Am I missing something in how bounding box queries work?

I've tried setting the "indexed" type, no change.

The standard response seems to be "get more servers", but I just want to
make sure I'm understanding what's happening here.

Thanks.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Jason-5) #2

P.S. This is a sample query that takes > 20,000 ms:

{
"query": {
"filtered": {
"query": {
"match_all": {}
}
}
},
"filter": {
"geo_bounding_box": {
"meta.bid_request.device.geo.loc": {
"top_left": {
"lat": 47.55,
"lon": -122.06
},
"bottom_right": {
"lat": 47.52,
"lon": -122.02
}
}
}
}
}

On Thursday, April 25, 2013 9:42:16 AM UTC-7, Jason wrote:

I have an index with around 3 million documents in which there is a single
geo_point field. When performing a geo_distance query I see response times
of around 25,000 ms. After looking more at how geo_distance needs to work
this wasn't surprising but I would have expected geo_bounding_box to be a
lot faster as it has absolute ranges to deal with (as opposed to having to
calculate distance for every document).

But actually it's no faster at all. I see roughly the same response time,
somewhere between 20 and 30 seconds. A "normal" query (i.e. one that does
not have any geo components) takes around 200-500ms on the hardware I am
using.

The reason this is confusing is that a simple range query on a numeric
field has negligible impact on query time. Isn't a bounding box query just
a set of range filters? Really just 2 ranges? (lat range and lon range).
Am I missing something in how bounding box queries work?

I've tried setting the "indexed" type, no change.

The standard response seems to be "get more servers", but I just want to
make sure I'm understanding what's happening here.

Thanks.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(David Pilato) #3

What happens if you change it to:

{
"query": {
"filtered": {
"query": {
"match_all": {}
},
"filter": {
"geo_bounding_box": {
"meta.bid_request.device.geo.loc": {
"top_left": {
"lat": 47.55,
"lon": -122.06
},
"bottom_right": {
"lat": 47.52,
"lon": -122.02
}
}
}
}
}
}
}

--
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet | @elasticsearchfr | @scrutmydocs

Le 25 avr. 2013 à 19:20, Jason jason.polites@gmail.com a écrit :

{
"query": {
"filtered": {
"query": {
"match_all": {}
}
}
},
"filter": {
"geo_bounding_box": {
"meta.bid_request.device.geo.loc": {
"top_left": {
"lat": 47.55,
"lon": -122.06
},
"bottom_right": {
"lat": 47.52,
"lon": -122.02
}
}
}
}
}


(Jason-5) #4

Same result. Actually my initial post was wrong. Your query is more like
what I am using. I have tried both variants.

On Thursday, April 25, 2013 10:59:35 AM UTC-7, David Pilato wrote:

What happens if you change it to:

{
"query": {
"filtered": {
"query": {
"match_all": {}
},
"filter": {
"geo_bounding_box": {
"meta.bid_request.device.geo.loc": {
"top_left": {
"lat": 47.55,
"lon": -122.06
},
"bottom_right": {
"lat": 47.52,
"lon": -122.02
}
}
}
}
}
}
}

--
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet https://twitter.com/dadoonet | @elasticsearchfrhttps://twitter.com/elasticsearchfr
| @scrutmydocs https://twitter.com/scrutmydocs

Le 25 avr. 2013 à 19:20, Jason <jason....@gmail.com <javascript:>> a
écrit :

{
"query": {
"filtered": {
"query": {
"match_all": {}
}
}
},
"filter": {
"geo_bounding_box": {
"meta.bid_request.device.geo.loc": {
"top_left": {
"lat": 47.55,
"lon": -122.06
},
"bottom_right": {
"lat": 47.52,
"lon": -122.02
}
}
}
}
}

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Jason-5) #5

The full query actually looks like this:

{
"query": {
"filtered": {
"query": {
"match_all": {}
},
"filter": {
"bool": {
"must": [
{
"geo_bounding_box": {
"meta.bid_request.device.geo.loc": {
"top_left": {
"lat": 36.733602161826056,
"lon": -120.38100948440444
},
"bottom_right": {
"lat": 33.03261333655745,
"lon": -115.86899051559556
}
}
}
}
]
}
}
}
},
"fields": [
"udid",
"meta.bid_request.device.carrier",
"meta.bid_request.device.os",
"meta.bid_request.app.name",
"meta.bid_request.app.global_aid",
"meta.bid_request.device.geo.loc",
"meta.bid_request.device.osv"
]
}

I removed the "fields" entry from the original post because it did not have
any impact on performance whether it was there or not.

On Thursday, April 25, 2013 12:37:39 PM UTC-7, Jason wrote:

Same result. Actually my initial post was wrong. Your query is more like
what I am using. I have tried both variants.

On Thursday, April 25, 2013 10:59:35 AM UTC-7, David Pilato wrote:

What happens if you change it to:

{
"query": {
"filtered": {
"query": {
"match_all": {}
},
"filter": {
"geo_bounding_box": {
"meta.bid_request.device.geo.loc": {
"top_left": {
"lat": 47.55,
"lon": -122.06
},
"bottom_right": {
"lat": 47.52,
"lon": -122.02
}
}
}
}
}
}
}

--
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet https://twitter.com/dadoonet | @elasticsearchfrhttps://twitter.com/elasticsearchfr
| @scrutmydocs https://twitter.com/scrutmydocs

Le 25 avr. 2013 à 19:20, Jason jason....@gmail.com a écrit :

{
"query": {
"filtered": {
"query": {
"match_all": {}
}
}
},
"filter": {
"geo_bounding_box": {
"meta.bid_request.device.geo.loc": {
"top_left": {
"lat": 47.55,
"lon": -122.06
},
"bottom_right": {
"lat": 47.52,
"lon": -122.02
}
}
}
}
}

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Jason-5) #6

P.S. I have attached the document mapping for reference. It's kinda large
(in terms of fields) but I can't really control that unfortunately.

On Thursday, April 25, 2013 10:59:35 AM UTC-7, David Pilato wrote:

What happens if you change it to:

{
"query": {
"filtered": {
"query": {
"match_all": {}
},
"filter": {
"geo_bounding_box": {
"meta.bid_request.device.geo.loc": {
"top_left": {
"lat": 47.55,
"lon": -122.06
},
"bottom_right": {
"lat": 47.52,
"lon": -122.02
}
}
}
}
}
}
}

--
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet https://twitter.com/dadoonet | @elasticsearchfrhttps://twitter.com/elasticsearchfr
| @scrutmydocs https://twitter.com/scrutmydocs

Le 25 avr. 2013 à 19:20, Jason <jason....@gmail.com <javascript:>> a
écrit :

{
"query": {
"filtered": {
"query": {
"match_all": {}
}
}
},
"filter": {
"geo_bounding_box": {
"meta.bid_request.device.geo.loc": {
"top_left": {
"lat": 47.55,
"lon": -122.06
},
"bottom_right": {
"lat": 47.52,
"lon": -122.02
}
}
}
}
}

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Jason-5) #7

Don't know if anyone is actually monitoring this.. but with more records
(currently ~5 million) the bounding box search time is over 5 minutes.
Actually I gave up waiting for it. Which basically means it's unusable.

This is on an EC2 instance with 16GB RAM and 10GB allocated to ES.

What are people using for geo queries? Is ES a plausible option for this?

On Thursday, April 25, 2013 12:46:52 PM UTC-7, Jason wrote:

P.S. I have attached the document mapping for reference. It's kinda
large (in terms of fields) but I can't really control that unfortunately.

On Thursday, April 25, 2013 10:59:35 AM UTC-7, David Pilato wrote:

What happens if you change it to:

{
"query": {
"filtered": {
"query": {
"match_all": {}
},
"filter": {
"geo_bounding_box": {
"meta.bid_request.device.geo.loc": {
"top_left": {
"lat": 47.55,
"lon": -122.06
},
"bottom_right": {
"lat": 47.52,
"lon": -122.02
}
}
}
}
}
}
}

--
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet https://twitter.com/dadoonet | @elasticsearchfrhttps://twitter.com/elasticsearchfr
| @scrutmydocs https://twitter.com/scrutmydocs

Le 25 avr. 2013 à 19:20, Jason jason....@gmail.com a écrit :

{
"query": {
"filtered": {
"query": {
"match_all": {}
}
}
},
"filter": {
"geo_bounding_box": {
"meta.bid_request.device.geo.loc": {
"top_left": {
"lat": 47.55,
"lon": -122.06
},
"bottom_right": {
"lat": 47.52,
"lon": -122.02
}
}
}
}
}

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Jason-5) #8

So at 7 million documents the index is almost completely non responsive,
even with simple queries.

Not sure what it's doing because I've had plain lucene indexes with way
more than this no problem.

Given that this forum doesn't seem to be monitored at all, I'm guessing
that we'd have to pay for official support.. which I guess is fair enough
but somehow I would have expected it would be able to handle 7 million docs
on a server with 16GB of RAM.

On Monday, April 29, 2013 10:32:36 AM UTC-7, Jason wrote:

Don't know if anyone is actually monitoring this.. but with more records
(currently ~5 million) the bounding box search time is over 5 minutes.
Actually I gave up waiting for it. Which basically means it's unusable.

This is on an EC2 instance with 16GB RAM and 10GB allocated to ES.

What are people using for geo queries? Is ES a plausible option for this?

On Thursday, April 25, 2013 12:46:52 PM UTC-7, Jason wrote:

P.S. I have attached the document mapping for reference. It's kinda
large (in terms of fields) but I can't really control that unfortunately.

On Thursday, April 25, 2013 10:59:35 AM UTC-7, David Pilato wrote:

What happens if you change it to:

{
"query": {
"filtered": {
"query": {
"match_all": {}
},
"filter": {
"geo_bounding_box": {
"meta.bid_request.device.geo.loc": {
"top_left": {
"lat": 47.55,
"lon": -122.06
},
"bottom_right": {
"lat": 47.52,
"lon": -122.02
}
}
}
}
}
}
}

--
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet https://twitter.com/dadoonet | @elasticsearchfrhttps://twitter.com/elasticsearchfr
| @scrutmydocs https://twitter.com/scrutmydocs

Le 25 avr. 2013 à 19:20, Jason jason....@gmail.com a écrit :

{
"query": {
"filtered": {
"query": {
"match_all": {}
}
}
},
"filter": {
"geo_bounding_box": {
"meta.bid_request.device.geo.loc": {
"top_left": {
"lat": 47.55,
"lon": -122.06
},
"bottom_right": {
"lat": 47.52,
"lon": -122.02
}
}
}
}
}

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Brian 'Phunk' Gadoury) #9

On Monday, May 6, 2013 9:06:04 PM UTC-6, Jason wrote:

So at 7 million documents the index is almost completely non responsive,
even with simple queries.

Not sure what it's doing because I've had plain lucene indexes with way
more than this no problem.

Given that this forum doesn't seem to be monitored at all, I'm guessing
that we'd have to pay for official support.. which I guess is fair enough
but somehow I would have expected it would be able to handle 7 million docs
on a server with 16GB of RAM.

I hesitate to reply only because our schemas and datasets are most likely
not even close to an apples-to-apples comparison. With that said, we have a
non-trivial schema, and roughly 2.5 million documents and we were not even
close to being able to do bigger sorting and faceting on a single 16GB VM
at Rackspace. (For what it's worth.) I would be shocked if you got anywhere
with a single 16GB cloud instance with 7M docs of any real size.

What does your server monitoring show? What's your bottleneck? Are you
looking at it with Bigdesk, Ganglia?

Also, if your response times were already in the 200-500ms range with
simple non-range queries back when you only had 3M docs, then you were
doomed. :wink:

Also, looking at your mapping, I see you're not making much use of
not_analyzed fields. It doesn't affect your geo_bounding_box issue here, of
course, but I see a lot of fields that could presumably be set to
not_analyzed which would probably save you some server resources.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Jason-5) #10

So, the standard answer is "get more servers", which seems to be what
you're saying and that makes complete sense. I guess I was curious about
why ES would need so much more than a simple lucene index which could
handle at least this amount without any problems, and why bounding box
queries specifically would be slow considering I assumed they were just a
set of simple numerical ranges. But I guess what I should really do is
just say, "listen.. you can't build elastic search so stop bitching and
just do what they tell you to".

:confused:

On Mon, May 13, 2013 at 3:16 PM, Brian Gadoury bgadoury@endpoint.comwrote:

On Monday, May 6, 2013 9:06:04 PM UTC-6, Jason wrote:

So at 7 million documents the index is almost completely non responsive,
even with simple queries.

Not sure what it's doing because I've had plain lucene indexes with way
more than this no problem.

Given that this forum doesn't seem to be monitored at all, I'm guessing
that we'd have to pay for official support.. which I guess is fair enough
but somehow I would have expected it would be able to handle 7 million docs
on a server with 16GB of RAM.

I hesitate to reply only because our schemas and datasets are most likely
not even close to an apples-to-apples comparison. With that said, we have a
non-trivial schema, and roughly 2.5 million documents and we were not even
close to being able to do bigger sorting and faceting on a single 16GB VM
at Rackspace. (For what it's worth.) I would be shocked if you got anywhere
with a single 16GB cloud instance with 7M docs of any real size.

What does your server monitoring show? What's your bottleneck? Are you
looking at it with Bigdesk, Ganglia?

Also, if your response times were already in the 200-500ms range with
simple non-range queries back when you only had 3M docs, then you were
doomed. :wink:

Also, looking at your mapping, I see you're not making much use of
not_analyzed fields. It doesn't affect your geo_bounding_box issue here, of
course, but I see a lot of fields that could presumably be set to
not_analyzed which would probably save you some server resources.

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/TS7Hk6tyHRE/unsubscribe?hl=en-US
.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
Ozzy's Odyssey! A new game for Android


http://www.carboncrystal.com/ http://www.carboncrystal.com/droid-odyssey/

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Brian 'Phunk' Gadoury) #11

On Monday, May 13, 2013 4:53:40 PM UTC-6, Jason wrote:

So, the standard answer is "get more servers", which seems to be what
you're saying and that makes complete sense. I guess I was curious about
why ES would need so much more than a simple lucene index which could
handle at least this amount without any problems, and why bounding box
queries specifically would be slow considering I assumed they were just a
set of simple numerical ranges. But I guess what I should really do is
just say, "listen.. you can't build elastic search so stop bitching and
just do what they tell you to".

Ultimately, yes. I think you need more horsepower. I can't say why, which
I understand is the crux of your frustration here.

You're still flying blind (and somewhat limiting the amount of help this
group can give you) if you aren't monitoring your server stats and figuring
out where your bottleneck is. You can have BigDesk running as a plugin with
1 command and a service restart. It'll take less than 60 seconds to do.

-Phunk

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Jason-5) #12

The server is almost certainly CPU bound and disk IO bound, although this
is just tacit information gleaned from running "top". It's not a huge
server and is really just for prototyping the solution, so I get that the
resource limitations of the server would be hindering search performance.
What I still don't understand is why bounding box queries are
significantly slower than range queries. I understand from previous
investigations that geo_distance queries are more resource intensive due to
the need to load all locations into memory then performing operations on
this set (which is going to hammer the CPU) but I assumed that bounding
box queries would not do this and would instead simply manifest as a set of
range queries that treat lat/lon values as simply floating point numeric
values. If I do a simply range query on other numerical values on the
documents I get query performance times that are not significantly
different to queries with other similar (simple) filters. Whereas if I do
a bounding_box filter I see similar performance as when running a
geo_distance query. Actually *extremely *similar. On a data set where a
range query takes 500ms and a geo_distance takes >= 20,000ms an equivalent
bounding box query takes >= 20,0000ms, which makes me think it's doing a
similar set of processes as done by the geo_distance query. This is the
confusing point.

Geo Distance as far as I can determine needs to load values into memory
because there is no automatic way of knowing the linear distance between
two points without computing it. But a bounding box is surely just a
range. If the lat/lon of the document is within the bounding box as
determined by a series of >=,<= conditions then one would think it matches
the query. Naturally there would need to be some consideration for
"wrapping" values around equatorial/meridian values but again this should
just be an OR clause.

Clearly there is something missing in my understanding of how bounding box
queries work and I guess I just wanted to understand how ES had implemented
this. I am sure that the ES peeps know what they're doing and there is
likely to be a very good reason why I'm talking out of my ass.

On Monday, May 13, 2013 4:27:10 PM UTC-7, Brian Gadoury wrote:

On Monday, May 13, 2013 4:53:40 PM UTC-6, Jason wrote:

So, the standard answer is "get more servers", which seems to be what
you're saying and that makes complete sense. I guess I was curious about
why ES would need so much more than a simple lucene index which could
handle at least this amount without any problems, and why bounding box
queries specifically would be slow considering I assumed they were just a
set of simple numerical ranges. But I guess what I should really do
is just say, "listen.. you can't build elastic search so stop bitching and
just do what they tell you to".

Ultimately, yes. I think you need more horsepower. I can't say why, which
I understand is the crux of your frustration here.

You're still flying blind (and somewhat limiting the amount of help this
group can give you) if you aren't monitoring your server stats and figuring
out where your bottleneck is. You can have BigDesk running as a plugin with
1 command and a service restart. It'll take less than 60 seconds to do.

-Phunk

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(David Pilato) #13

Hi Brian,

You can have BigDesk running as a plugin with 1 command and a service restart. It'll take less than 60 seconds to do.

Just a note: you don't need to restart Elasticsearch when you install a site plugin.
So it could take less than 60 seconds :wink:

David

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Travis Bullock) #14

Hi Jason,
Did you end up implementing this using range filters or was there another
solution? I've just started looking at ElasticSearch and am equally stumped
at how bounding_box shouldn't be much faster.

On Monday, May 13, 2013 6:43:15 PM UTC-5, Jason wrote:

The server is almost certainly CPU bound and disk IO bound, although this
is just tacit information gleaned from running "top". It's not a huge
server and is really just for prototyping the solution, so I get that the
resource limitations of the server would be hindering search performance.
What I still don't understand is why bounding box queries are
significantly slower than range queries. I understand from previous
investigations that geo_distance queries are more resource intensive due to
the need to load all locations into memory then performing operations on
this set (which is going to hammer the CPU) but I assumed that bounding
box queries would not do this and would instead simply manifest as a set of
range queries that treat lat/lon values as simply floating point numeric
values. If I do a simply range query on other numerical values on the
documents I get query performance times that are not significantly
different to queries with other similar (simple) filters. Whereas if I do
a bounding_box filter I see similar performance as when running a
geo_distance query. Actually *extremely *similar. On a data set where a
range query takes 500ms and a geo_distance takes >= 20,000ms an equivalent
bounding box query takes >= 20,0000ms, which makes me think it's doing a
similar set of processes as done by the geo_distance query. This is the
confusing point.

Geo Distance as far as I can determine needs to load values into memory
because there is no automatic way of knowing the linear distance between
two points without computing it. But a bounding box is surely just a
range. If the lat/lon of the document is within the bounding box as
determined by a series of >=,<= conditions then one would think it matches
the query. Naturally there would need to be some consideration for
"wrapping" values around equatorial/meridian values but again this should
just be an OR clause.

Clearly there is something missing in my understanding of how bounding box
queries work and I guess I just wanted to understand how ES had implemented
this. I am sure that the ES peeps know what they're doing and there is
likely to be a very good reason why I'm talking out of my ass.

On Monday, May 13, 2013 4:27:10 PM UTC-7, Brian Gadoury wrote:

On Monday, May 13, 2013 4:53:40 PM UTC-6, Jason wrote:

So, the standard answer is "get more servers", which seems to be what
you're saying and that makes complete sense. I guess I was curious about
why ES would need so much more than a simple lucene index which could
handle at least this amount without any problems, and why bounding box
queries specifically would be slow considering I assumed they were just a
set of simple numerical ranges. But I guess what I should really do
is just say, "listen.. you can't build elastic search so stop bitching and
just do what they tell you to".

Ultimately, yes. I think you need more horsepower. I can't say why,
which I understand is the crux of your frustration here.

You're still flying blind (and somewhat limiting the amount of help this
group can give you) if you aren't monitoring your server stats and figuring
out where your bottleneck is. You can have BigDesk running as a plugin with
1 command and a service restart. It'll take less than 60 seconds to do.

-Phunk

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/444d5ee6-4691-44d5-9506-b4b5e54f8373%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Jason-5) #15

Hey Travis,

Unfortunately because this as more than 3 months ago my brain has
determined it is no longer relevant and hence all the information has
seeped out of my ears and into the trash.

That being said, I don't think we ever "solved" the problem. The project
this was for was ultimately canned so it never went into production and
hence never got the care and attention it needed.

If I had continued to develop the system, I think I may have attempted
my own bounding box criteria that did simple numerical ranges. No doubt I
would have also subsequently realized that computing the coordinates of a
"box" as a set of spherical coordinates is harder than it looks and got it
completely wrong.. but I'd have had a go.

On Wednesday, January 15, 2014 4:26:52 PM UTC-8, Travis Bullock wrote:

Hi Jason,
Did you end up implementing this using range filters or was there another
solution? I've just started looking at ElasticSearch and am equally stumped
at how bounding_box shouldn't be much faster.

On Monday, May 13, 2013 6:43:15 PM UTC-5, Jason wrote:

The server is almost certainly CPU bound and disk IO bound, although this
is just tacit information gleaned from running "top". It's not a huge
server and is really just for prototyping the solution, so I get that the
resource limitations of the server would be hindering search performance.
What I still don't understand is why bounding box queries are
significantly slower than range queries. I understand from previous
investigations that geo_distance queries are more resource intensive due to
the need to load all locations into memory then performing operations on
this set (which is going to hammer the CPU) but I assumed that
bounding box queries would not do this and would instead simply manifest as
a set of range queries that treat lat/lon values as simply floating point
numeric values. If I do a simply range query on other numerical values on
the documents I get query performance times that are not significantly
different to queries with other similar (simple) filters. Whereas if I do
a bounding_box filter I see similar performance as when running a
geo_distance query. Actually *extremely *similar. On a data set where
a range query takes 500ms and a geo_distance takes >= 20,000ms an
equivalent bounding box query takes >= 20,0000ms, which makes me think it's
doing a similar set of processes as done by the geo_distance query. This
is the confusing point.

Geo Distance as far as I can determine needs to load values into
memory because there is no automatic way of knowing the linear distance
between two points without computing it. But a bounding box is surely just
a range. If the lat/lon of the document is within the bounding box as
determined by a series of >=,<= conditions then one would think it matches
the query. Naturally there would need to be some consideration for
"wrapping" values around equatorial/meridian values but again this should
just be an OR clause.

Clearly there is something missing in my understanding of how bounding
box queries work and I guess I just wanted to understand how ES had
implemented this. I am sure that the ES peeps know what they're doing
and there is likely to be a very good reason why I'm talking out of my ass.

On Monday, May 13, 2013 4:27:10 PM UTC-7, Brian Gadoury wrote:

On Monday, May 13, 2013 4:53:40 PM UTC-6, Jason wrote:

So, the standard answer is "get more servers", which seems to be what
you're saying and that makes complete sense. I guess I was curious about
why ES would need so much more than a simple lucene index which could
handle at least this amount without any problems, and why bounding box
queries specifically would be slow considering I assumed they were just a
set of simple numerical ranges. But I guess what I should really do
is just say, "listen.. you can't build elastic search so stop bitching and
just do what they tell you to".

Ultimately, yes. I think you need more horsepower. I can't say why,
which I understand is the crux of your frustration here.

You're still flying blind (and somewhat limiting the amount of help this
group can give you) if you aren't monitoring your server stats and figuring
out where your bottleneck is. You can have BigDesk running as a plugin with
1 command and a service restart. It'll take less than 60 seconds to do.

-Phunk

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/68ff4ec8-a2ae-42e3-bfa7-3178b87b2686%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(system) #16