Hi David,
I think Elasticsearch is nearly offering what I need but not quite yet. On
paper it looks all good with support for all of the relevant different
shapes. Part of my problem has been figuring out what exactly there is,
what actually works, how it works, and what the current state is in terms
of maturity, performance, etc. The documentation is a bit of a mixed bag
currently and I'm not that familiar with the Elasticsearch code base that
I can easily make sense of what it is trying to do. That also puts me in an
awkward position with respect to suggesting changes since I can't really
speak authoritatively on what it does right now. This is not an easy code
base to work with currently.
The geo_shape functionality in Elasticsearch looks promising but at the
same time there have been major changes to it very recently, which worries
me since it indicates the design is a bit in flux right now and it is
unclear where things are going.
My recent experiments with using geo_shape and intersects indicate that
there seems to be accuracy problems that need work. I think mostly these
are known issues already. Particularly, I'm getting false negatives on some
queries and other people have reported false positives. Another issue in
Elasticsearch is that I think there needs to be an effort to unify the
geo_shape and geo_point types. I don't see a good reason for having two
types here, especially since both seem to lack features that the other has.
I'm aware that my choice of geotools as a name was a bit unfortunate given
the other project with the same name. I might have to rename it ;-). My
intention with it was to keep things simple and implement some generic
things that I need to implement geospatial search. In my view this is a
very simple problem: given a shape, cover it with geohashes, index them and
use wildcard queries to find things back. You don't even need Lucene for
this and I have used a similar approach with mysql and couchdb in the past.
One of my goals was to avoid coming up with yet another Point, Polygon,
etc. class hierarchy and instead try to imitate the geojson model of simply
using multi dimensional arrays of doubles to represent points, lines,
polygons, etc. The algorithms are mostly just scavenged and adapted from
different sources. This has the advantage of avoiding a lot of work
translating one object model to another (impedence mismatch) and also
minimizes the memory used. That last point is important for me since I use
this library for data processing as well on a pretty large scale.
In any case, if you find something of use please use it. I'll be happy to
assist you integrating/refactoring if needed.
Jilles
On Wednesday, March 6, 2013 9:09:41 PM UTC+1, David Smiley wrote:
Hi Jilles,
I'm David Smiley, committer on Lucene/Solr specializing in spatial. I did
much of the spatial implementation that I believe Elasticsearch now uses.
Jilles, are there spatial features that Elasticsearch isn't offering that
require you to work with geohashes & what'not at the user level as you
describe? If so, perhaps these use-cases are en-route to Elasticsearch via
Lucene up-stream eventually, and if not then maybe you should add feature
requests so they get tracked and eventually implemented. I think geohashes
are neat but an implementation detail that you wouldn't even have to be
aware of if the search platform has all the spatial features you need.
p.s. I'll have to look at your "geotools" (not to be confused with the
GeoTools at geotools.org) some time, for possible re-use of relevant
algorithms in Spatial4j (the spatial dependency of Lucene-spatial that I
work on).
Cheers,
David Smiley
On Wednesday, March 6, 2013 4:18:08 AM UTC-5, Jilles van Gurp wrote:
No, not really. The way geohashes work is that they are pretty much
rectangular themselves. So shortening the geohash prefix increases the size
of the rectangle. The problem is that you can't move them around. So, you'd
need multiple geohashes to cover an arbitrary bounding box unless the
bounding box happens to overlap exactly with a single geohash..
What you can do is calculate the set of geohashes that cover a particular
boundingbox (or any polygon) and use smaller prefixes to cover the insides
and larger ones for the boundaries. That is pretty much what geo_shape does
but you can also do it manually and associate the geohashes with a field.
Then you simply use prefix queries to search for anything that overlaps
with the shape.
I've done this before and I have a library to support the geohash
manipulation: GitHub - jillesvangurp/geotools
Jilles
On Wednesday, March 6, 2013 9:57:31 AM UTC+1, Gian Luca Ortelli wrote:
Hi,
I read on the docs that you can define the bounding box using 2 geohash
values for the top left and bottom right corners. Is there a way to specify
the bounding box using a single geohash value? Typically, it would be a
value with less than 12 chars, to search all the documents containing it as
a prefix.
Thanks,
Gianluca
--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.