I'm currently working on a project that visualizes geospatial data in
Elasticsearch. One of the things I am doing is generating heatmaps with
the geohash grid aggregation. I would like to take this to the extreme
case of gridding down to the individual pixel level to display raster
images of a data set, but I am not concerned with the total doc count of
each geohash. Is there a way (or could it be implemented) where an
optimized aggregation could be run that simply lists the existing terms
(geohashes) and does not bother with aggregating their counts? If this
significantly improved performance, such a feature would be very valuable.
I'm currently working on a project that visualizes geospatial data in
Elasticsearch. One of the things I am doing is generating heatmaps with
the geohash grid aggregation. I would like to take this to the extreme
case of gridding down to the individual pixel level to display raster
images of a data set, but I am not concerned with the total doc count of
each geohash. Is there a way (or could it be implemented) where an
optimized aggregation could be run that simply lists the existing terms
(geohashes) and does not bother with aggregating their counts? If this
significantly improved performance, such a feature would be very valuable.
Thanks for that. I had a feeling that that might be the case.
Any tips on improving aggregation performance. I'm working with a 20 shard
index that is loaded on a 20 node cluster. Geohash grid aggregations on
the entire data set (with the size set to unlimited - a requirement) can
take as long as 8 seconds (and return ~ 1 million buckets). I am very
happy with that performance, but if there are any tricks to improve it I
would be glad to do so.
Thanks,
Elliott
On Tuesday, December 30, 2014 11:48:52 AM UTC-5, Adrien Grand wrote:
Hi Eliott,
The overhead of computing the doc counts is actually low, I don't think
you should worry about it.
On Tue, Dec 30, 2014 at 5:12 PM, Elliott Bradshaw <ebrad...@gmail.com
<javascript:>> wrote:
Hi,
I'm currently working on a project that visualizes geospatial data in
Elasticsearch. One of the things I am doing is generating heatmaps with
the geohash grid aggregation. I would like to take this to the extreme
case of gridding down to the individual pixel level to display raster
images of a data set, but I am not concerned with the total doc count of
each geohash. Is there a way (or could it be implemented) where an
optimized aggregation could be run that simply lists the existing terms
(geohashes) and does not bother with aggregating their counts? If this
significantly improved performance, such a feature would be very valuable.
Just as a thought, would setting geohash = true or geohash_prefix = true at
index time improve performance?
On Monday, January 5, 2015 7:20:32 AM UTC-5, Elliott Bradshaw wrote:
Adrian,
Thanks for that. I had a feeling that that might be the case.
Any tips on improving aggregation performance. I'm working with a 20
shard index that is loaded on a 20 node cluster. Geohash grid aggregations
on the entire data set (with the size set to unlimited - a requirement) can
take as long as 8 seconds (and return ~ 1 million buckets). I am very
happy with that performance, but if there are any tricks to improve it I
would be glad to do so.
Thanks,
Elliott
On Tuesday, December 30, 2014 11:48:52 AM UTC-5, Adrien Grand wrote:
Hi Eliott,
The overhead of computing the doc counts is actually low, I don't think
you should worry about it.
On Tue, Dec 30, 2014 at 5:12 PM, Elliott Bradshaw ebrad...@gmail.com
wrote:
Hi,
I'm currently working on a project that visualizes geospatial data in
Elasticsearch. One of the things I am doing is generating heatmaps with
the geohash grid aggregation. I would like to take this to the extreme
case of gridding down to the individual pixel level to display raster
images of a data set, but I am not concerned with the total doc count of
each geohash. Is there a way (or could it be implemented) where an
optimized aggregation could be run that simply lists the existing terms
(geohashes) and does not bother with aggregating their counts? If this
significantly improved performance, such a feature would be very valuable.
No it wouldn't. I don't have ideas about how to improve performance, are
you running only a geohash grid aggregation or do you also have sub
aggregations? Also 1 million buckets is a lot, if it would work for you to
decrease the value of the precision parameter, this could help with
performance.
Just as a thought, would setting geohash = true or geohash_prefix = true
at index time improve performance?
On Monday, January 5, 2015 7:20:32 AM UTC-5, Elliott Bradshaw wrote:
Adrian,
Thanks for that. I had a feeling that that might be the case.
Any tips on improving aggregation performance. I'm working with a 20
shard index that is loaded on a 20 node cluster. Geohash grid aggregations
on the entire data set (with the size set to unlimited - a requirement) can
take as long as 8 seconds (and return ~ 1 million buckets). I am very
happy with that performance, but if there are any tricks to improve it I
would be glad to do so.
Thanks,
Elliott
On Tuesday, December 30, 2014 11:48:52 AM UTC-5, Adrien Grand wrote:
Hi Eliott,
The overhead of computing the doc counts is actually low, I don't think
you should worry about it.
On Tue, Dec 30, 2014 at 5:12 PM, Elliott Bradshaw ebrad...@gmail.com
wrote:
Hi,
I'm currently working on a project that visualizes geospatial data in
Elasticsearch. One of the things I am doing is generating heatmaps with
the geohash grid aggregation. I would like to take this to the extreme
case of gridding down to the individual pixel level to display raster
images of a data set, but I am not concerned with the total doc count of
each geohash. Is there a way (or could it be implemented) where an
optimized aggregation could be run that simply lists the existing terms
(geohashes) and does not bother with aggregating their counts? If this
significantly improved performance, such a feature would be very valuable.
I am only running a geohash grid aggregation. I reduce the precision
parameter as much as I can in each case. Any guesses on where most of the
time is being spent? I could dig through the source...
On Monday, January 5, 2015 9:49:01 AM UTC-5, Adrien Grand wrote:
No it wouldn't. I don't have ideas about how to improve performance, are
you running only a geohash grid aggregation or do you also have sub
aggregations? Also 1 million buckets is a lot, if it would work for you to
decrease the value of the precision parameter, this could help with
performance.
On Mon, Jan 5, 2015 at 1:22 PM, Elliott Bradshaw <ebrad...@gmail.com
<javascript:>> wrote:
Just as a thought, would setting geohash = true or geohash_prefix = true
at index time improve performance?
On Monday, January 5, 2015 7:20:32 AM UTC-5, Elliott Bradshaw wrote:
Adrian,
Thanks for that. I had a feeling that that might be the case.
Any tips on improving aggregation performance. I'm working with a 20
shard index that is loaded on a 20 node cluster. Geohash grid aggregations
on the entire data set (with the size set to unlimited - a requirement) can
take as long as 8 seconds (and return ~ 1 million buckets). I am very
happy with that performance, but if there are any tricks to improve it I
would be glad to do so.
Thanks,
Elliott
On Tuesday, December 30, 2014 11:48:52 AM UTC-5, Adrien Grand wrote:
Hi Eliott,
The overhead of computing the doc counts is actually low, I don't think
you should worry about it.
On Tue, Dec 30, 2014 at 5:12 PM, Elliott Bradshaw ebrad...@gmail.com
wrote:
Hi,
I'm currently working on a project that visualizes geospatial data in
Elasticsearch. One of the things I am doing is generating heatmaps with
the geohash grid aggregation. I would like to take this to the extreme
case of gridding down to the individual pixel level to display raster
images of a data set, but I am not concerned with the total doc count of
each geohash. Is there a way (or could it be implemented) where an
optimized aggregation could be run that simply lists the existing terms
(geohashes) and does not bother with aggregating their counts? If this
significantly improved performance, such a feature would be very valuable.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.