"Aggregations" without doc-counts

Hi,

I'm currently working on a project that visualizes geospatial data in
Elasticsearch. One of the things I am doing is generating heatmaps with
the geohash grid aggregation. I would like to take this to the extreme
case of gridding down to the individual pixel level to display raster
images of a data set, but I am not concerned with the total doc count of
each geohash. Is there a way (or could it be implemented) where an
optimized aggregation could be run that simply lists the existing terms
(geohashes) and does not bother with aggregating their counts? If this
significantly improved performance, such a feature would be very valuable.

Thanks!

  • Elliott Bradshaw

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/834ebcb1-43b3-486d-bd1a-952005a6a66d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Hi Eliott,

The overhead of computing the doc counts is actually low, I don't think you
should worry about it.

On Tue, Dec 30, 2014 at 5:12 PM, Elliott Bradshaw ebradshaw1@gmail.com
wrote:

Hi,

I'm currently working on a project that visualizes geospatial data in
Elasticsearch. One of the things I am doing is generating heatmaps with
the geohash grid aggregation. I would like to take this to the extreme
case of gridding down to the individual pixel level to display raster
images of a data set, but I am not concerned with the total doc count of
each geohash. Is there a way (or could it be implemented) where an
optimized aggregation could be run that simply lists the existing terms
(geohashes) and does not bother with aggregating their counts? If this
significantly improved performance, such a feature would be very valuable.

Thanks!

  • Elliott Bradshaw

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/834ebcb1-43b3-486d-bd1a-952005a6a66d%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/834ebcb1-43b3-486d-bd1a-952005a6a66d%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
Adrien Grand

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAL6Z4j45%2BgfqBk73Mfh_b6JVLcG9E7RfkE9eovPgL5kYG%3DzRug%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Adrian,

Thanks for that. I had a feeling that that might be the case.

Any tips on improving aggregation performance. I'm working with a 20 shard
index that is loaded on a 20 node cluster. Geohash grid aggregations on
the entire data set (with the size set to unlimited - a requirement) can
take as long as 8 seconds (and return ~ 1 million buckets). I am very
happy with that performance, but if there are any tricks to improve it I
would be glad to do so.

Thanks,

Elliott

On Tuesday, December 30, 2014 11:48:52 AM UTC-5, Adrien Grand wrote:

Hi Eliott,

The overhead of computing the doc counts is actually low, I don't think
you should worry about it.

On Tue, Dec 30, 2014 at 5:12 PM, Elliott Bradshaw <ebrad...@gmail.com
<javascript:>> wrote:

Hi,

I'm currently working on a project that visualizes geospatial data in
Elasticsearch. One of the things I am doing is generating heatmaps with
the geohash grid aggregation. I would like to take this to the extreme
case of gridding down to the individual pixel level to display raster
images of a data set, but I am not concerned with the total doc count of
each geohash. Is there a way (or could it be implemented) where an
optimized aggregation could be run that simply lists the existing terms
(geohashes) and does not bother with aggregating their counts? If this
significantly improved performance, such a feature would be very valuable.

Thanks!

  • Elliott Bradshaw

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/834ebcb1-43b3-486d-bd1a-952005a6a66d%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/834ebcb1-43b3-486d-bd1a-952005a6a66d%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
Adrien Grand

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/53882b08-db93-4116-8c70-b6c1158eb178%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Just as a thought, would setting geohash = true or geohash_prefix = true at
index time improve performance?

On Monday, January 5, 2015 7:20:32 AM UTC-5, Elliott Bradshaw wrote:

Adrian,

Thanks for that. I had a feeling that that might be the case.

Any tips on improving aggregation performance. I'm working with a 20
shard index that is loaded on a 20 node cluster. Geohash grid aggregations
on the entire data set (with the size set to unlimited - a requirement) can
take as long as 8 seconds (and return ~ 1 million buckets). I am very
happy with that performance, but if there are any tricks to improve it I
would be glad to do so.

Thanks,

Elliott

On Tuesday, December 30, 2014 11:48:52 AM UTC-5, Adrien Grand wrote:

Hi Eliott,

The overhead of computing the doc counts is actually low, I don't think
you should worry about it.

On Tue, Dec 30, 2014 at 5:12 PM, Elliott Bradshaw ebrad...@gmail.com
wrote:

Hi,

I'm currently working on a project that visualizes geospatial data in
Elasticsearch. One of the things I am doing is generating heatmaps with
the geohash grid aggregation. I would like to take this to the extreme
case of gridding down to the individual pixel level to display raster
images of a data set, but I am not concerned with the total doc count of
each geohash. Is there a way (or could it be implemented) where an
optimized aggregation could be run that simply lists the existing terms
(geohashes) and does not bother with aggregating their counts? If this
significantly improved performance, such a feature would be very valuable.

Thanks!

  • Elliott Bradshaw

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/834ebcb1-43b3-486d-bd1a-952005a6a66d%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/834ebcb1-43b3-486d-bd1a-952005a6a66d%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
Adrien Grand

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/d83c0bc5-bac5-4bae-9984-74ffbf6cd8b3%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

No it wouldn't. I don't have ideas about how to improve performance, are
you running only a geohash grid aggregation or do you also have sub
aggregations? Also 1 million buckets is a lot, if it would work for you to
decrease the value of the precision parameter, this could help with
performance.

On Mon, Jan 5, 2015 at 1:22 PM, Elliott Bradshaw ebradshaw1@gmail.com
wrote:

Just as a thought, would setting geohash = true or geohash_prefix = true
at index time improve performance?

On Monday, January 5, 2015 7:20:32 AM UTC-5, Elliott Bradshaw wrote:

Adrian,

Thanks for that. I had a feeling that that might be the case.

Any tips on improving aggregation performance. I'm working with a 20
shard index that is loaded on a 20 node cluster. Geohash grid aggregations
on the entire data set (with the size set to unlimited - a requirement) can
take as long as 8 seconds (and return ~ 1 million buckets). I am very
happy with that performance, but if there are any tricks to improve it I
would be glad to do so.

Thanks,

Elliott

On Tuesday, December 30, 2014 11:48:52 AM UTC-5, Adrien Grand wrote:

Hi Eliott,

The overhead of computing the doc counts is actually low, I don't think
you should worry about it.

On Tue, Dec 30, 2014 at 5:12 PM, Elliott Bradshaw ebrad...@gmail.com
wrote:

Hi,

I'm currently working on a project that visualizes geospatial data in
Elasticsearch. One of the things I am doing is generating heatmaps with
the geohash grid aggregation. I would like to take this to the extreme
case of gridding down to the individual pixel level to display raster
images of a data set, but I am not concerned with the total doc count of
each geohash. Is there a way (or could it be implemented) where an
optimized aggregation could be run that simply lists the existing terms
(geohashes) and does not bother with aggregating their counts? If this
significantly improved performance, such a feature would be very valuable.

Thanks!

  • Elliott Bradshaw

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/834ebcb1-43b3-486d-bd1a-952005a6a66d%
40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/834ebcb1-43b3-486d-bd1a-952005a6a66d%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
Adrien Grand

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/d83c0bc5-bac5-4bae-9984-74ffbf6cd8b3%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/d83c0bc5-bac5-4bae-9984-74ffbf6cd8b3%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
Adrien Grand

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAL6Z4j7EHNTtNWqnbK-t1tECku-WDtxq2omRvOhQsw4ZLh_jsQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

I am only running a geohash grid aggregation. I reduce the precision
parameter as much as I can in each case. Any guesses on where most of the
time is being spent? I could dig through the source...

On Monday, January 5, 2015 9:49:01 AM UTC-5, Adrien Grand wrote:

No it wouldn't. I don't have ideas about how to improve performance, are
you running only a geohash grid aggregation or do you also have sub
aggregations? Also 1 million buckets is a lot, if it would work for you to
decrease the value of the precision parameter, this could help with
performance.

On Mon, Jan 5, 2015 at 1:22 PM, Elliott Bradshaw <ebrad...@gmail.com
<javascript:>> wrote:

Just as a thought, would setting geohash = true or geohash_prefix = true
at index time improve performance?

On Monday, January 5, 2015 7:20:32 AM UTC-5, Elliott Bradshaw wrote:

Adrian,

Thanks for that. I had a feeling that that might be the case.

Any tips on improving aggregation performance. I'm working with a 20
shard index that is loaded on a 20 node cluster. Geohash grid aggregations
on the entire data set (with the size set to unlimited - a requirement) can
take as long as 8 seconds (and return ~ 1 million buckets). I am very
happy with that performance, but if there are any tricks to improve it I
would be glad to do so.

Thanks,

Elliott

On Tuesday, December 30, 2014 11:48:52 AM UTC-5, Adrien Grand wrote:

Hi Eliott,

The overhead of computing the doc counts is actually low, I don't think
you should worry about it.

On Tue, Dec 30, 2014 at 5:12 PM, Elliott Bradshaw ebrad...@gmail.com
wrote:

Hi,

I'm currently working on a project that visualizes geospatial data in
Elasticsearch. One of the things I am doing is generating heatmaps with
the geohash grid aggregation. I would like to take this to the extreme
case of gridding down to the individual pixel level to display raster
images of a data set, but I am not concerned with the total doc count of
each geohash. Is there a way (or could it be implemented) where an
optimized aggregation could be run that simply lists the existing terms
(geohashes) and does not bother with aggregating their counts? If this
significantly improved performance, such a feature would be very valuable.

Thanks!

  • Elliott Bradshaw

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/834ebcb1-43b3-486d-bd1a-952005a6a66d%
40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/834ebcb1-43b3-486d-bd1a-952005a6a66d%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
Adrien Grand

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/d83c0bc5-bac5-4bae-9984-74ffbf6cd8b3%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/d83c0bc5-bac5-4bae-9984-74ffbf6cd8b3%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
Adrien Grand

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/2d55880c-e539-4614-a99e-77d9cede47f1%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.