After weeks of prototyping, I finally got around to giving es a more
serious workout then my simplistic tests. Since I have been posting and
commenting a lot on the geo_shape functionality here, I thought I'd share
some details on my setup.
I have about 30M pois, which are a big part of a bigger data set that we
are going to use. These are simple points, with a small amount of meta data
(name, category, source). This is excluding the open street map data that I
posted last week. When completed, that will be a bit bigger data set of up
to a couple of hundred of million pois, streets, buildings, neighborhoods,
etc.
So, on a completely generic, pretty much bog standard es running on my mac
book pro I threw the my data at it using a little ruby script that uses the
bulk api with 6 threads and 500 documents for each request. I gave es 1500M
of memory (my laptop has 8). I'm using a fairly recent snapshot that is
about a week old now (after lucene 4.2 was integrated).
I have a mapping that looks maps the geometry field in my json to geo_shape:
"properties": {
"geometry": {
"tree": "quadtree",
"tree_levels": 20,
"type": "geo_shape",
}
}
I find this particular setting delivers good enough accuracy & index size.
The raw input is about 3GB of json. In my test setup, I have only one shard
currently.
It took about 1 hour to index the 3M documents. Index size is about 11GB.
That's about a bit less than 10K documents per second. I ran a couple of
test queries intersecting a small polygon (50m) at different places in the
world and they all came back in a very reasonable 20-30 ms. Even while it
was still indexing. The latter point I found particularly impressive. So
that is good performance on a cold index that is still being indexed at a
rate of 10K documents per second. On a laptop. Not bad.
This is one of the queries (randomly selected a coordinate in
london): {"from":0, "size":10, "version":true, "explain":true,
"query":{"filtered":{"query":{"match_all":{}},
"filter":{"geo_shape":{"geometry":{"shape":{"type":"Polygon",
"coordinates":[[[-0.11039300861473171, 51.51717889117429],
[-0.11045049097119652, 51.51731495268828], [-0.11057272644040506,
51.51743330129237], [-0.11074774976296656, 51.51752235220061],
[-0.11095842843660952, 51.51757338848964], [-0.11118413976487317,
51.517581414371904], [-0.1114027895503237, 51.51754564421813],
[-0.11159297482853864, 51.51746957946019], [-0.11173607893938753,
51.51736066584657], [-0.11181809385543551, 51.51722956460061],
[-0.1118309913852683, 51.51708910882571], [-0.11177350902880348,
51.516953047311716], [-0.11165127355959495, 51.516834698707626],
[-0.11147625023703345, 51.51674564779939], [-0.11126557156339048,
51.51669461151036], [-0.11103986023512684, 51.516686585628094],
[-0.1108212104496763, 51.516722355781866], [-0.11063102517146137,
51.51679842053981], [-0.11048792106061248, 51.51690733415343],
[-0.1104059061445645, 51.517038435399385], [-0.11039300861473171,
51.51717889117429]]]}, "relation":"intersects"}}}}}}
Overall I'm pretty happy with this level of performance & index size and
this totally validates usage of elastic search for geospatial search for
us.
So, a big thanks to all who helped me get to this stage here and offline.
--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.