Doing bounding box filtering using a geohash prefix

Gian_Luca_Ortelli · March 6, 2013, 8:57am

Hi,

I read on the docs that you can define the bounding box using 2 geohash
values for the top left and bottom right corners. Is there a way to specify
the bounding box using a single geohash value? Typically, it would be a
value with less than 12 chars, to search all the documents containing it as
a prefix.

Thanks,
Gianluca

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Jilles_van_Gurp · March 6, 2013, 9:18am

No, not really. The way geohashes work is that they are pretty much
rectangular themselves. So shortening the geohash prefix increases the size
of the rectangle. The problem is that you can't move them around. So, you'd
need multiple geohashes to cover an arbitrary bounding box unless the
bounding box happens to overlap exactly with a single geohash..

What you can do is calculate the set of geohashes that cover a particular
boundingbox (or any polygon) and use smaller prefixes to cover the insides
and larger ones for the boundaries. That is pretty much what geo_shape does
but you can also do it manually and associate the geohashes with a field.
Then you simply use prefix queries to search for anything that overlaps
with the shape.

I've done this before and I have a library to support the geohash
manipulation: GitHub - jillesvangurp/geotools

Jilles

On Wednesday, March 6, 2013 9:57:31 AM UTC+1, Gian Luca Ortelli wrote:

Hi,

I read on the docs that you can define the bounding box using 2 geohash
values for the top left and bottom right corners. Is there a way to specify
the bounding box using a single geohash value? Typically, it would be a
value with less than 12 chars, to search all the documents containing it as
a prefix.

Thanks,
Gianluca

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Gian_Luca_Ortelli · March 6, 2013, 10:05am

Hi,

luckily I don't have the need to cover arbitrary bounding boxes; because of
how my application works, the rectangle defined by a geohash prefix IS my
bounding box.

My problem is with the exact construction of the query; you mention prefix
queries, but I guess I should index the geohash as a normal field to use
them, right? I'm wondering if there is a way to search directly on the
geopoint field, like a variant of the bounding box syntax. Are you aware of
anything like that?

Thanks,
Gianluca

On Wednesday, March 6, 2013 10:18:08 AM UTC+1, Jilles van Gurp wrote:

No, not really. The way geohashes work is that they are pretty much
rectangular themselves. So shortening the geohash prefix increases the size
of the rectangle. The problem is that you can't move them around. So, you'd
need multiple geohashes to cover an arbitrary bounding box unless the
bounding box happens to overlap exactly with a single geohash..

What you can do is calculate the set of geohashes that cover a particular
boundingbox (or any polygon) and use smaller prefixes to cover the insides
and larger ones for the boundaries. That is pretty much what geo_shape does
but you can also do it manually and associate the geohashes with a field.
Then you simply use prefix queries to search for anything that overlaps
with the shape.

I've done this before and I have a library to support the geohash
manipulation: GitHub - jillesvangurp/geotools

Jilles

On Wednesday, March 6, 2013 9:57:31 AM UTC+1, Gian Luca Ortelli wrote:

Hi,

I read on the docs that you can define the bounding box using 2 geohash
values for the top left and bottom right corners. Is there a way to specify
the bounding box using a single geohash value? Typically, it would be a
value with less than 12 chars, to search all the documents containing it as
a prefix.

Thanks,
Gianluca

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Jilles_van_Gurp · March 6, 2013, 10:13am

I haven't got any experience with the geopoint type. I guess in principle
it should be possible to query a lucene field directly though I'm not sure
how this would work in Elasticsearch. At the lucene level, there are no
field types: it's all just terms. You may need to write some plugin to get
to the lucene field directly.

But it is probably easier to just add the geohash as a string field and
search on that. You can use prefix, term, wild card queries etc. and it
should just work.

Jilles

On Wednesday, March 6, 2013 11:05:07 AM UTC+1, Gian Luca Ortelli wrote:

Hi,

luckily I don't have the need to cover arbitrary bounding boxes; because
of how my application works, the rectangle defined by a geohash prefix IS
my bounding box.

My problem is with the exact construction of the query; you mention prefix
queries, but I guess I should index the geohash as a normal field to use
them, right? I'm wondering if there is a way to search directly on the
geopoint field, like a variant of the bounding box syntax. Are you aware of
anything like that?

Thanks,
Gianluca

On Wednesday, March 6, 2013 10:18:08 AM UTC+1, Jilles van Gurp wrote:

No, not really. The way geohashes work is that they are pretty much
rectangular themselves. So shortening the geohash prefix increases the size
of the rectangle. The problem is that you can't move them around. So, you'd
need multiple geohashes to cover an arbitrary bounding box unless the
bounding box happens to overlap exactly with a single geohash..

What you can do is calculate the set of geohashes that cover a particular
boundingbox (or any polygon) and use smaller prefixes to cover the insides
and larger ones for the boundaries. That is pretty much what geo_shape does
but you can also do it manually and associate the geohashes with a field.
Then you simply use prefix queries to search for anything that overlaps
with the shape.

I've done this before and I have a library to support the geohash
manipulation: GitHub - jillesvangurp/geotools

Jilles

On Wednesday, March 6, 2013 9:57:31 AM UTC+1, Gian Luca Ortelli wrote:

Hi,

I read on the docs that you can define the bounding box using 2 geohash
values for the top left and bottom right corners. Is there a way to specify
the bounding box using a single geohash value? Typically, it would be a
value with less than 12 chars, to search all the documents containing it as
a prefix.

Thanks,
Gianluca

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Gian_Luca_Ortelli · March 6, 2013, 11:01am

Thanks Jilles,

I'll go with a normal query then.

Gianluca

On Wednesday, March 6, 2013 11:13:18 AM UTC+1, Jilles van Gurp wrote:

I haven't got any experience with the geopoint type. I guess in principle
it should be possible to query a lucene field directly though I'm not sure
how this would work in Elasticsearch. At the lucene level, there are no
field types: it's all just terms. You may need to write some plugin to get
to the lucene field directly.

But it is probably easier to just add the geohash as a string field and
search on that. You can use prefix, term, wild card queries etc. and it
should just work.

Jilles

On Wednesday, March 6, 2013 11:05:07 AM UTC+1, Gian Luca Ortelli wrote:

Hi,

luckily I don't have the need to cover arbitrary bounding boxes; because
of how my application works, the rectangle defined by a geohash prefix IS
my bounding box.

My problem is with the exact construction of the query; you mention
prefix queries, but I guess I should index the geohash as a normal field to
use them, right? I'm wondering if there is a way to search directly on the
geopoint field, like a variant of the bounding box syntax. Are you aware of
anything like that?

Thanks,
Gianluca

On Wednesday, March 6, 2013 10:18:08 AM UTC+1, Jilles van Gurp wrote:

No, not really. The way geohashes work is that they are pretty much
rectangular themselves. So shortening the geohash prefix increases the size
of the rectangle. The problem is that you can't move them around. So, you'd
need multiple geohashes to cover an arbitrary bounding box unless the
bounding box happens to overlap exactly with a single geohash..

What you can do is calculate the set of geohashes that cover a
particular boundingbox (or any polygon) and use smaller prefixes to cover
the insides and larger ones for the boundaries. That is pretty much what
geo_shape does but you can also do it manually and associate the geohashes
with a field. Then you simply use prefix queries to search for anything
that overlaps with the shape.

I've done this before and I have a library to support the geohash
manipulation: GitHub - jillesvangurp/geotools

Jilles

On Wednesday, March 6, 2013 9:57:31 AM UTC+1, Gian Luca Ortelli wrote:

Hi,

I read on the docs that you can define the bounding box using 2 geohash
values for the top left and bottom right corners. Is there a way to specify
the bounding box using a single geohash value? Typically, it would be a
value with less than 12 chars, to search all the documents containing it as
a prefix.

Thanks,
Gianluca

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

David_Smiley · March 6, 2013, 8:09pm

Hi Jilles,

I'm David Smiley, committer on Lucene/Solr specializing in spatial. I did
much of the spatial implementation that I believe Elasticsearch now uses.
Jilles, are there spatial features that Elasticsearch isn't offering that
require you to work with geohashes & what'not at the user level as you
describe? If so, perhaps these use-cases are en-route to Elasticsearch via
Lucene up-stream eventually, and if not then maybe you should add feature
requests so they get tracked and eventually implemented. I think geohashes
are neat but an implementation detail that you wouldn't even have to be
aware of if the search platform has all the spatial features you need.

p.s. I'll have to look at your "geotools" (not to be confused with the
GeoTools at geotools.org) some time, for possible re-use of relevant
algorithms in Spatial4j (the spatial dependency of Lucene-spatial that I
work on).

Cheers,
David Smiley

On Wednesday, March 6, 2013 4:18:08 AM UTC-5, Jilles van Gurp wrote:

No, not really. The way geohashes work is that they are pretty much
rectangular themselves. So shortening the geohash prefix increases the size
of the rectangle. The problem is that you can't move them around. So, you'd
need multiple geohashes to cover an arbitrary bounding box unless the
bounding box happens to overlap exactly with a single geohash..

What you can do is calculate the set of geohashes that cover a particular
boundingbox (or any polygon) and use smaller prefixes to cover the insides
and larger ones for the boundaries. That is pretty much what geo_shape does
but you can also do it manually and associate the geohashes with a field.
Then you simply use prefix queries to search for anything that overlaps
with the shape.

I've done this before and I have a library to support the geohash
manipulation: GitHub - jillesvangurp/geotools

Jilles

On Wednesday, March 6, 2013 9:57:31 AM UTC+1, Gian Luca Ortelli wrote:

Hi,

I read on the docs that you can define the bounding box using 2 geohash
values for the top left and bottom right corners. Is there a way to specify
the bounding box using a single geohash value? Typically, it would be a
value with less than 12 chars, to search all the documents containing it as
a prefix.

Thanks,
Gianluca

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Jilles_van_Gurp · March 6, 2013, 10:39pm

Hi David,

I think Elasticsearch is nearly offering what I need but not quite yet. On
paper it looks all good with support for all of the relevant different
shapes. Part of my problem has been figuring out what exactly there is,
what actually works, how it works, and what the current state is in terms
of maturity, performance, etc. The documentation is a bit of a mixed bag
currently and I'm not that familiar with the Elasticsearch code base that
I can easily make sense of what it is trying to do. That also puts me in an
awkward position with respect to suggesting changes since I can't really
speak authoritatively on what it does right now. This is not an easy code
base to work with currently.

The geo_shape functionality in Elasticsearch looks promising but at the
same time there have been major changes to it very recently, which worries
me since it indicates the design is a bit in flux right now and it is
unclear where things are going.

My recent experiments with using geo_shape and intersects indicate that
there seems to be accuracy problems that need work. I think mostly these
are known issues already. Particularly, I'm getting false negatives on some
queries and other people have reported false positives. Another issue in
Elasticsearch is that I think there needs to be an effort to unify the
geo_shape and geo_point types. I don't see a good reason for having two
types here, especially since both seem to lack features that the other has.

I'm aware that my choice of geotools as a name was a bit unfortunate given
the other project with the same name. I might have to rename it ;-). My
intention with it was to keep things simple and implement some generic
things that I need to implement geospatial search. In my view this is a
very simple problem: given a shape, cover it with geohashes, index them and
use wildcard queries to find things back. You don't even need Lucene for
this and I have used a similar approach with mysql and couchdb in the past.

One of my goals was to avoid coming up with yet another Point, Polygon,
etc. class hierarchy and instead try to imitate the geojson model of simply
using multi dimensional arrays of doubles to represent points, lines,
polygons, etc. The algorithms are mostly just scavenged and adapted from
different sources. This has the advantage of avoiding a lot of work
translating one object model to another (impedence mismatch) and also
minimizes the memory used. That last point is important for me since I use
this library for data processing as well on a pretty large scale.

In any case, if you find something of use please use it. I'll be happy to
assist you integrating/refactoring if needed.

Jilles

On Wednesday, March 6, 2013 9:09:41 PM UTC+1, David Smiley wrote:

Hi Jilles,

I'm David Smiley, committer on Lucene/Solr specializing in spatial. I did
much of the spatial implementation that I believe Elasticsearch now uses.
Jilles, are there spatial features that Elasticsearch isn't offering that
require you to work with geohashes & what'not at the user level as you
describe? If so, perhaps these use-cases are en-route to Elasticsearch via
Lucene up-stream eventually, and if not then maybe you should add feature
requests so they get tracked and eventually implemented. I think geohashes
are neat but an implementation detail that you wouldn't even have to be
aware of if the search platform has all the spatial features you need.

p.s. I'll have to look at your "geotools" (not to be confused with the
GeoTools at geotools.org) some time, for possible re-use of relevant
algorithms in Spatial4j (the spatial dependency of Lucene-spatial that I
work on).

Cheers,
David Smiley

On Wednesday, March 6, 2013 4:18:08 AM UTC-5, Jilles van Gurp wrote:

No, not really. The way geohashes work is that they are pretty much
rectangular themselves. So shortening the geohash prefix increases the size
of the rectangle. The problem is that you can't move them around. So, you'd
need multiple geohashes to cover an arbitrary bounding box unless the
bounding box happens to overlap exactly with a single geohash..

What you can do is calculate the set of geohashes that cover a particular
boundingbox (or any polygon) and use smaller prefixes to cover the insides
and larger ones for the boundaries. That is pretty much what geo_shape does
but you can also do it manually and associate the geohashes with a field.
Then you simply use prefix queries to search for anything that overlaps
with the shape.

I've done this before and I have a library to support the geohash
manipulation: GitHub - jillesvangurp/geotools

Jilles

On Wednesday, March 6, 2013 9:57:31 AM UTC+1, Gian Luca Ortelli wrote:

Hi,

I read on the docs that you can define the bounding box using 2 geohash
values for the top left and bottom right corners. Is there a way to specify
the bounding box using a single geohash value? Typically, it would be a
value with less than 12 chars, to search all the documents containing it as
a prefix.

Thanks,
Gianluca

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Gian_Luca_Ortelli · March 7, 2013, 9:07am

Hi David,

About my use case:

I need to fetch documents from elasticsearch to display them on a map,
grouping together icons that would otherwise overlap.

I decided to group using the geohash of the location of a document; for a
certain level of zoom, I consider only the n-length prefix of the
geohashes, and group together documents having the same prefix. In short,
it's a grid based clustering algorithm, where the grid cells are defined by
the geohash encoding.

The result is quite good. The missing piece is that the client must be able
to ask all the documents belonging to a certain group. Since the client
already knows the geohash prefix of the group (i.e., the common prefix of
all the documents belonging to the group), the most straightforward
solution would be to just use that value in the query.

A prefix query on the geohash is a solution; but if I should translate my
need into a feature request, it would be a variation of the syntax for the
bounding box filter, like below:

{
"filtered" : {
"query" : {
"match_all" : {}
},
"filter" : {
"geo_bounding_box" : {
"pin.location" : {

               **"cell" : "drm3btev"*,
          }

      }
  }

}
}

Gianluca

On Wednesday, March 6, 2013 9:09:41 PM UTC+1, David Smiley wrote:

Hi Jilles,

I'm David Smiley, committer on Lucene/Solr specializing in spatial. I did
much of the spatial implementation that I believe Elasticsearch now uses.
Jilles, are there spatial features that Elasticsearch isn't offering that
require you to work with geohashes & what'not at the user level as you
describe? If so, perhaps these use-cases are en-route to Elasticsearch via
Lucene up-stream eventually, and if not then maybe you should add feature
requests so they get tracked and eventually implemented. I think geohashes
are neat but an implementation detail that you wouldn't even have to be
aware of if the search platform has all the spatial features you need.

p.s. I'll have to look at your "geotools" (not to be confused with the
GeoTools at geotools.org) some time, for possible re-use of relevant
algorithms in Spatial4j (the spatial dependency of Lucene-spatial that I
work on).

Cheers,
David Smiley

On Wednesday, March 6, 2013 4:18:08 AM UTC-5, Jilles van Gurp wrote:

No, not really. The way geohashes work is that they are pretty much
rectangular themselves. So shortening the geohash prefix increases the size
of the rectangle. The problem is that you can't move them around. So, you'd
need multiple geohashes to cover an arbitrary bounding box unless the
bounding box happens to overlap exactly with a single geohash..

What you can do is calculate the set of geohashes that cover a particular
boundingbox (or any polygon) and use smaller prefixes to cover the insides
and larger ones for the boundaries. That is pretty much what geo_shape does
but you can also do it manually and associate the geohashes with a field.
Then you simply use prefix queries to search for anything that overlaps
with the shape.

I've done this before and I have a library to support the geohash
manipulation: GitHub - jillesvangurp/geotools

Jilles

On Wednesday, March 6, 2013 9:57:31 AM UTC+1, Gian Luca Ortelli wrote:

Hi,

I read on the docs that you can define the bounding box using 2 geohash
values for the top left and bottom right corners. Is there a way to specify
the bounding box using a single geohash value? Typically, it would be a
value with less than 12 chars, to search all the documents containing it as
a prefix.

Thanks,
Gianluca

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

David_Smiley · March 7, 2013, 8:02pm

Sounds great. I recently started doing something similar and I'll try and
see how this might be baked in. However I think this is something that
would be at the ES or Solr levels, not at the Lucene level, because it gets
into faceting. You mentioned "grouping" but my approach uses faceting.

~ David

On Thu, Mar 7, 2013 at 4:07 AM, Gian Luca Ortelli <
gianluca.ortelli@trifork.nl> wrote:

Hi David,

About my use case:

I need to fetch documents from elasticsearch to display them on a map,
grouping together icons that would otherwise overlap.

I decided to group using the geohash of the location of a document; for a
certain level of zoom, I consider only the n-length prefix of the
geohashes, and group together documents having the same prefix. In short,
it's a grid based clustering algorithm, where the grid cells are defined by
the geohash encoding.

The result is quite good. The missing piece is that the client must be
able to ask all the documents belonging to a certain group. Since the
client already knows the geohash prefix of the group (i.e., the common
prefix of all the documents belonging to the group), the most
straightforward solution would be to just use that value in the query.

A prefix query on the geohash is a solution; but if I should translate my
need into a feature request, it would be a variation of the syntax for the
bounding box filter, like below:

{
"filtered" : {
"query" : {
"match_all" : {}
},
"filter" : {
"geo_bounding_box" : {
"pin.location" : {
               **"cell" : "drm3btev"*,
          }

      }
  }
}
}

Gianluca
On Wednesday, March 6, 2013 9:09:41 PM UTC+1, David Smiley wrote:

Hi Jilles,

I'm David Smiley, committer on Lucene/Solr specializing in spatial. I
did much of the spatial implementation that I believe Elasticsearch now
uses. Jilles, are there spatial features that Elasticsearch isn't offering
that require you to work with geohashes & what'not at the user level as you
describe? If so, perhaps these use-cases are en-route to Elasticsearch via
Lucene up-stream eventually, and if not then maybe you should add feature
requests so they get tracked and eventually implemented. I think geohashes
are neat but an implementation detail that you wouldn't even have to be
aware of if the search platform has all the spatial features you need.

p.s. I'll have to look at your "geotools" (not to be confused with the
GeoTools at geotools.org) some time, for possible re-use of relevant
algorithms in Spatial4j (the spatial dependency of Lucene-spatial that I
work on).

Cheers,
David Smiley

On Wednesday, March 6, 2013 4:18:08 AM UTC-5, Jilles van Gurp wrote:

No, not really. The way geohashes work is that they are pretty much
rectangular themselves. So shortening the geohash prefix increases the size
of the rectangle. The problem is that you can't move them around. So, you'd
need multiple geohashes to cover an arbitrary bounding box unless the
bounding box happens to overlap exactly with a single geohash..

What you can do is calculate the set of geohashes that cover a
particular boundingbox (or any polygon) and use smaller prefixes to cover
the insides and larger ones for the boundaries. That is pretty much what
geo_shape does but you can also do it manually and associate the geohashes
with a field. Then you simply use prefix queries to search for anything
that overlaps with the shape.

I've done this before and I have a library to support the geohash
manipulation: https://github.**com/jillesvangurp/geotoolshttps://github.com/jillesvangurp/geotools

Jilles

On Wednesday, March 6, 2013 9:57:31 AM UTC+1, Gian Luca Ortelli wrote:

Hi,

I read on the docs that you can define the bounding box using 2 geohash
values for the top left and bottom right corners. Is there a way to specify
the bounding box using a single geohash value? Typically, it would be a
value with less than 12 chars, to search all the documents containing it as
a prefix.

Thanks,
Gianluca

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/sdej4vRNL3U/unsubscribe?hl=en-US
.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Jilles_van_Gurp · March 13, 2013, 9:51am

FYI, I have renamed the geotools project. It is now named
geogeometry: GitHub - jillesvangurp/geogeometry: GeoGeometry is a set of algorithms and functions for manipulating geo hashes and geometric shapes with geo coordinates.

I'm aware that my choice of geotools as a name was a bit unfortunate given
the other project with the same name. I might have to rename it ;-). My
intention with it was to keep things simple and implement some generic
things that I need to implement geospatial search. In my view this is a
very simple problem: given a shape, cover it with geohashes, index them and
use wildcard queries to find things back. You don't even need Lucene for
this and I have used a similar approach with mysql and couchdb in the past.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Topic		Replies	Views
GeoShape - rectangle intersection performance Elasticsearch	2	821	July 5, 2017
Search with multiple geo_bounding_box Elasticsearch	2	469	March 24, 2018
Spatial search in elastic search (discussion) Elasticsearch	3	485	July 6, 2017
Java API Client - Geo Bounding Box Query from Geohash Elasticsearch language-clients	2	201	April 24, 2024
Elasticsearch Geo queries - Java API Elasticsearch	7	1506	May 7, 2019

Doing bounding box filtering using a geohash prefix

Related topics