Difference between geo_point and geo_shape (point)


(Georgi Ivanov) #1

Hi,

I am indexing some pretty big ammount of positions in ES (like 150M ) in
monthy based indexes (201312 , 201311 etc)

One document has a timestamp and location.

My queries are like :
Give me all positions inside this boundig box... etc

I have 2 types of indexes with exaclty the same mapping except the location
fields.
Ex:
loc: {
type: geo_point
}

loc: {
tree: quadtree
type: geo_shape
}

It seems to me that there is big difference in the speed of the queries
agains the two types of indexes.

The index with location of type geo_shape is MUCH faster that the index
with geo_point.
With cold caches the query with geo_point runs for aout 26 seconds , where
the query with geo_shape runs for like 2 seconds.
Also the query with geo_point type loads huge ammount of data in field
cache (8GB for just one month data). With geo_shape field data is much less.

The geo_shape mapping is with default precision and qudtree type.
Both queries have the same logic.

I would like to undestand why it is much fatser with geo_shape than
geo_point.
Can someone shade some light on this matter ?

Ofc the index with geo_shape is like 30% bigger in size.

Example query for index type geo_shape
{
"query": {
"bool": {
"must": [
{
"range": {
"ts": {
"from": "2013-11-01",
"to": "2013-12-30"
}
}
},
{
"geo_shape": {
"loc": {
"shape": {
"type": "envelope",
"coordinates": [
[ 1.6754645,53.786 ],
[14.345234, 51.3453 ]
]
}
}
}
}
],
}

},
"aggregations": {
"agg1": {
"terms": {
"field": "e_id"
}
}
},
"size": 0
}

Example query for index type geo_point
{
"query": {
"bool": {
"must": [
{
"range": {
"ts": {
"from": "2013-11-01",
"to": "2013-12-30"
}
}
},
{
"geo_bounding_box" : {
"loc" : {
"top_left" : {
"lat" : 40.73,
"lon" : -74.1
},
"bottom_right" : {
"lat" : 40.01,
"lon" : -71.12
}
}
}
}
],
}
},
"aggregations": {
"agg1": {
"terms": {
"field": "e_id"
}
}
},
"size": 0
}

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/4e721191-8164-40cf-aa3f-d882dec10cad%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Best approach to find points within radius
(Alexander Reelsen) #2

Hey,

this is all about storing and computing. First, lets take a look at
geo_point

  • Index: Is stored as two floats lat/lon in the index
  • Query: All geo points are loaded into memory (thus your big fielddata)
    and then in memory calculations are executed

Now the geo_shape

  • Index: The shape is converted into terms and then stored in the index
    (thus your big index size)
  • Query: A full-text search is basically used to check if a shape is inside
    of another (do they include the same terms?)

Possible speed improvements:

  • geo_point: Use warmer APIs
  • geo_point: Maybe caching helps, your query location is always the same.
  • geo_point: Maybe the geo_hash_cell filter helps you in terms of speed
    (needs a special mapping)
  • geo_shape: Less precision, less index size, you can change that in the
    mapping

At the end of day you are meeting a classic tradeoff here. Are willing to
use more disk or are you willing to compute more things on query time?

Hope it makes sense as a quick intro...

--Alex

On Wed, Mar 19, 2014 at 9:42 PM, Georgi Ivanov georgi.r.ivanov@gmail.comwrote:

Hi,

I am indexing some pretty big ammount of positions in ES (like 150M ) in
monthy based indexes (201312 , 201311 etc)

One document has a timestamp and location.

My queries are like :
Give me all positions inside this boundig box... etc

I have 2 types of indexes with exaclty the same mapping except the
location fields.
Ex:
loc: {
type: geo_point
}

loc: {
tree: quadtree
type: geo_shape
}

It seems to me that there is big difference in the speed of the queries
agains the two types of indexes.

The index with location of type geo_shape is MUCH faster that the index
with geo_point.
With cold caches the query with geo_point runs for aout 26 seconds , where
the query with geo_shape runs for like 2 seconds.
Also the query with geo_point type loads huge ammount of data in field
cache (8GB for just one month data). With geo_shape field data is much less.

The geo_shape mapping is with default precision and qudtree type.
Both queries have the same logic.

I would like to undestand why it is much fatser with geo_shape than
geo_point.
Can someone shade some light on this matter ?

Ofc the index with geo_shape is like 30% bigger in size.

Example query for index type geo_shape
{
"query": {
"bool": {
"must": [
{
"range": {
"ts": {
"from": "2013-11-01",
"to": "2013-12-30"
}
}
},
{
"geo_shape": {
"loc": {
"shape": {
"type": "envelope",
"coordinates": [
[ 1.6754645,53.786 ],
[14.345234, 51.3453 ]
]
}
}
}
}
],
}

},
"aggregations": {
"agg1": {
"terms": {
"field": "e_id"
}
}
},
"size": 0
}

Example query for index type geo_point
{
"query": {
"bool": {
"must": [
{
"range": {
"ts": {
"from": "2013-11-01",
"to": "2013-12-30"
}
}
},
{
"geo_bounding_box" : {
"loc" : {
"top_left" : {
"lat" : 40.73,
"lon" : -74.1
},
"bottom_right" : {
"lat" : 40.01,
"lon" : -71.12
}
}
}
}
],
}
},
"aggregations": {
"agg1": {
"terms": {
"field": "e_id"
}
}
},
"size": 0
}

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/4e721191-8164-40cf-aa3f-d882dec10cad%40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/4e721191-8164-40cf-aa3f-d882dec10cad%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAGCwEM9iBFNQXOTUgUwOO4EeM40mXuKqxZiNZLZ%2B9N%2B%2BZWTbzQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(Georgi Ivanov) #3

Thanks Alex,
That makes perfect sense.
For now I am sticking with geo_shape type here .
Except the index size , everything is much smoother here.

I could recommend geo_shape if one needs geo queries all the time (like me)

George

2014-03-31 9:09 GMT+02:00 Alexander Reelsen alr@spinscale.de:

Hey,

this is all about storing and computing. First, lets take a look at
geo_point

  • Index: Is stored as two floats lat/lon in the index
  • Query: All geo points are loaded into memory (thus your big fielddata)
    and then in memory calculations are executed

Now the geo_shape

  • Index: The shape is converted into terms and then stored in the index
    (thus your big index size)
  • Query: A full-text search is basically used to check if a shape is
    inside of another (do they include the same terms?)

Possible speed improvements:

  • geo_point: Use warmer APIs
  • geo_point: Maybe caching helps, your query location is always the same.
  • geo_point: Maybe the geo_hash_cell filter helps you in terms of speed
    (needs a special mapping)
  • geo_shape: Less precision, less index size, you can change that in the
    mapping

At the end of day you are meeting a classic tradeoff here. Are willing to
use more disk or are you willing to compute more things on query time?

Hope it makes sense as a quick intro...

--Alex

On Wed, Mar 19, 2014 at 9:42 PM, Georgi Ivanov georgi.r.ivanov@gmail.comwrote:

Hi,

I am indexing some pretty big ammount of positions in ES (like 150M ) in
monthy based indexes (201312 , 201311 etc)

One document has a timestamp and location.

My queries are like :
Give me all positions inside this boundig box... etc

I have 2 types of indexes with exaclty the same mapping except the
location fields.
Ex:
loc: {
type: geo_point
}

loc: {
tree: quadtree
type: geo_shape
}

It seems to me that there is big difference in the speed of the queries
agains the two types of indexes.

The index with location of type geo_shape is MUCH faster that the index
with geo_point.
With cold caches the query with geo_point runs for aout 26 seconds ,
where the query with geo_shape runs for like 2 seconds.
Also the query with geo_point type loads huge ammount of data in field
cache (8GB for just one month data). With geo_shape field data is much less.

The geo_shape mapping is with default precision and qudtree type.
Both queries have the same logic.

I would like to undestand why it is much fatser with geo_shape than
geo_point.
Can someone shade some light on this matter ?

Ofc the index with geo_shape is like 30% bigger in size.

Example query for index type geo_shape
{
"query": {
"bool": {
"must": [
{
"range": {
"ts": {
"from": "2013-11-01",
"to": "2013-12-30"
}
}
},
{
"geo_shape": {
"loc": {
"shape": {
"type": "envelope",
"coordinates": [
[ 1.6754645,53.786 ],
[14.345234, 51.3453 ]
]
}
}
}
}
],
}

},
"aggregations": {
"agg1": {
"terms": {
"field": "e_id"
}
}
},
"size": 0
}

Example query for index type geo_point
{
"query": {
"bool": {
"must": [
{
"range": {
"ts": {
"from": "2013-11-01",
"to": "2013-12-30"
}
}
},
{
"geo_bounding_box" : {
"loc" : {
"top_left" : {
"lat" : 40.73,
"lon" : -74.1
},
"bottom_right" : {
"lat" : 40.01,
"lon" : -71.12
}
}
}
}
],
}
},
"aggregations": {
"agg1": {
"terms": {
"field": "e_id"
}
}
},
"size": 0
}

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.

To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/4e721191-8164-40cf-aa3f-d882dec10cad%40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/4e721191-8164-40cf-aa3f-d882dec10cad%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/GYPrniLiJis/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAGCwEM9iBFNQXOTUgUwOO4EeM40mXuKqxZiNZLZ%2B9N%2B%2BZWTbzQ%40mail.gmail.comhttps://groups.google.com/d/msgid/elasticsearch/CAGCwEM9iBFNQXOTUgUwOO4EeM40mXuKqxZiNZLZ%2B9N%2B%2BZWTbzQ%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAGKxwgmH4oYns7yD3NGSRJnFmUFcCanGRqqT3OSc9R1u2Y3DKA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(system) #4