"Aggregations" without doc-counts

chuck · December 30, 2014, 4:12pm

Hi,

I'm currently working on a project that visualizes geospatial data in
Elasticsearch. One of the things I am doing is generating heatmaps with
the geohash grid aggregation. I would like to take this to the extreme
case of gridding down to the individual pixel level to display raster
images of a data set, but I am not concerned with the total doc count of
each geohash. Is there a way (or could it be implemented) where an
optimized aggregation could be run that simply lists the existing terms
(geohashes) and does not bother with aggregating their counts? If this
significantly improved performance, such a feature would be very valuable.

Thanks!

Elliott Bradshaw

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/834ebcb1-43b3-486d-bd1a-952005a6a66d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

jpountz · December 30, 2014, 4:48pm

Hi Eliott,

The overhead of computing the doc counts is actually low, I don't think you
should worry about it.

On Tue, Dec 30, 2014 at 5:12 PM, Elliott Bradshaw ebradshaw1@gmail.com
wrote:

Hi,

I'm currently working on a project that visualizes geospatial data in
Elasticsearch. One of the things I am doing is generating heatmaps with
the geohash grid aggregation. I would like to take this to the extreme
case of gridding down to the individual pixel level to display raster
images of a data set, but I am not concerned with the total doc count of
each geohash. Is there a way (or could it be implemented) where an
optimized aggregation could be run that simply lists the existing terms
(geohashes) and does not bother with aggregating their counts? If this
significantly improved performance, such a feature would be very valuable.

Thanks!

Elliott Bradshaw

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/834ebcb1-43b3-486d-bd1a-952005a6a66d%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/834ebcb1-43b3-486d-bd1a-952005a6a66d%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
Adrien Grand

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAL6Z4j45%2BgfqBk73Mfh_b6JVLcG9E7RfkE9eovPgL5kYG%3DzRug%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

chuck · January 5, 2015, 12:20pm

Adrian,

Thanks for that. I had a feeling that that might be the case.

Any tips on improving aggregation performance. I'm working with a 20 shard
index that is loaded on a 20 node cluster. Geohash grid aggregations on
the entire data set (with the size set to unlimited - a requirement) can
take as long as 8 seconds (and return ~ 1 million buckets). I am very
happy with that performance, but if there are any tricks to improve it I
would be glad to do so.

Thanks,

Elliott

On Tuesday, December 30, 2014 11:48:52 AM UTC-5, Adrien Grand wrote:

Hi Eliott,

The overhead of computing the doc counts is actually low, I don't think
you should worry about it.

On Tue, Dec 30, 2014 at 5:12 PM, Elliott Bradshaw <ebrad...@gmail.com
<javascript:>> wrote:

Hi,

I'm currently working on a project that visualizes geospatial data in
Elasticsearch. One of the things I am doing is generating heatmaps with
the geohash grid aggregation. I would like to take this to the extreme
case of gridding down to the individual pixel level to display raster
images of a data set, but I am not concerned with the total doc count of
each geohash. Is there a way (or could it be implemented) where an
optimized aggregation could be run that simply lists the existing terms
(geohashes) and does not bother with aggregating their counts? If this
significantly improved performance, such a feature would be very valuable.

Thanks!

Elliott Bradshaw

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/834ebcb1-43b3-486d-bd1a-952005a6a66d%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/834ebcb1-43b3-486d-bd1a-952005a6a66d%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
Adrien Grand

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/53882b08-db93-4116-8c70-b6c1158eb178%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

chuck · January 5, 2015, 12:22pm

Just as a thought, would setting geohash = true or geohash_prefix = true at
index time improve performance?

On Monday, January 5, 2015 7:20:32 AM UTC-5, Elliott Bradshaw wrote:

Adrian,

Thanks for that. I had a feeling that that might be the case.

Any tips on improving aggregation performance. I'm working with a 20
shard index that is loaded on a 20 node cluster. Geohash grid aggregations
on the entire data set (with the size set to unlimited - a requirement) can
take as long as 8 seconds (and return ~ 1 million buckets). I am very
happy with that performance, but if there are any tricks to improve it I
would be glad to do so.

Thanks,

Elliott

On Tuesday, December 30, 2014 11:48:52 AM UTC-5, Adrien Grand wrote:

Hi Eliott,

The overhead of computing the doc counts is actually low, I don't think
you should worry about it.

On Tue, Dec 30, 2014 at 5:12 PM, Elliott Bradshaw ebrad...@gmail.com
wrote:

Hi,

I'm currently working on a project that visualizes geospatial data in
Elasticsearch. One of the things I am doing is generating heatmaps with
the geohash grid aggregation. I would like to take this to the extreme
case of gridding down to the individual pixel level to display raster
images of a data set, but I am not concerned with the total doc count of
each geohash. Is there a way (or could it be implemented) where an
optimized aggregation could be run that simply lists the existing terms
(geohashes) and does not bother with aggregating their counts? If this
significantly improved performance, such a feature would be very valuable.

Thanks!

Elliott Bradshaw

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/834ebcb1-43b3-486d-bd1a-952005a6a66d%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/834ebcb1-43b3-486d-bd1a-952005a6a66d%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
Adrien Grand

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/d83c0bc5-bac5-4bae-9984-74ffbf6cd8b3%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

jpountz · January 5, 2015, 2:48pm

No it wouldn't. I don't have ideas about how to improve performance, are
you running only a geohash grid aggregation or do you also have sub
aggregations? Also 1 million buckets is a lot, if it would work for you to
decrease the value of the precision parameter, this could help with
performance.

On Mon, Jan 5, 2015 at 1:22 PM, Elliott Bradshaw ebradshaw1@gmail.com
wrote:

Just as a thought, would setting geohash = true or geohash_prefix = true
at index time improve performance?

On Monday, January 5, 2015 7:20:32 AM UTC-5, Elliott Bradshaw wrote:

Adrian,

Thanks for that. I had a feeling that that might be the case.

Any tips on improving aggregation performance. I'm working with a 20
shard index that is loaded on a 20 node cluster. Geohash grid aggregations
on the entire data set (with the size set to unlimited - a requirement) can
take as long as 8 seconds (and return ~ 1 million buckets). I am very
happy with that performance, but if there are any tricks to improve it I
would be glad to do so.

Thanks,

Elliott

On Tuesday, December 30, 2014 11:48:52 AM UTC-5, Adrien Grand wrote:

Hi Eliott,

The overhead of computing the doc counts is actually low, I don't think
you should worry about it.

On Tue, Dec 30, 2014 at 5:12 PM, Elliott Bradshaw ebrad...@gmail.com
wrote:

Hi,

I'm currently working on a project that visualizes geospatial data in
Elasticsearch. One of the things I am doing is generating heatmaps with
the geohash grid aggregation. I would like to take this to the extreme
case of gridding down to the individual pixel level to display raster
images of a data set, but I am not concerned with the total doc count of
each geohash. Is there a way (or could it be implemented) where an
optimized aggregation could be run that simply lists the existing terms
(geohashes) and does not bother with aggregating their counts? If this
significantly improved performance, such a feature would be very valuable.

Thanks!

Elliott Bradshaw

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/834ebcb1-43b3-486d-bd1a-952005a6a66d%
40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/834ebcb1-43b3-486d-bd1a-952005a6a66d%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
Adrien Grand

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/d83c0bc5-bac5-4bae-9984-74ffbf6cd8b3%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/d83c0bc5-bac5-4bae-9984-74ffbf6cd8b3%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
Adrien Grand

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAL6Z4j7EHNTtNWqnbK-t1tECku-WDtxq2omRvOhQsw4ZLh_jsQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

chuck · January 5, 2015, 7:09pm

I am only running a geohash grid aggregation. I reduce the precision
parameter as much as I can in each case. Any guesses on where most of the
time is being spent? I could dig through the source...

On Monday, January 5, 2015 9:49:01 AM UTC-5, Adrien Grand wrote:

No it wouldn't. I don't have ideas about how to improve performance, are
you running only a geohash grid aggregation or do you also have sub
aggregations? Also 1 million buckets is a lot, if it would work for you to
decrease the value of the precision parameter, this could help with
performance.

On Mon, Jan 5, 2015 at 1:22 PM, Elliott Bradshaw <ebrad...@gmail.com
<javascript:>> wrote:

Just as a thought, would setting geohash = true or geohash_prefix = true
at index time improve performance?

On Monday, January 5, 2015 7:20:32 AM UTC-5, Elliott Bradshaw wrote:

Adrian,

Thanks for that. I had a feeling that that might be the case.

Any tips on improving aggregation performance. I'm working with a 20
shard index that is loaded on a 20 node cluster. Geohash grid aggregations
on the entire data set (with the size set to unlimited - a requirement) can
take as long as 8 seconds (and return ~ 1 million buckets). I am very
happy with that performance, but if there are any tricks to improve it I
would be glad to do so.

Thanks,

Elliott

On Tuesday, December 30, 2014 11:48:52 AM UTC-5, Adrien Grand wrote:

Hi Eliott,

The overhead of computing the doc counts is actually low, I don't think
you should worry about it.

On Tue, Dec 30, 2014 at 5:12 PM, Elliott Bradshaw ebrad...@gmail.com
wrote:

Hi,

I'm currently working on a project that visualizes geospatial data in
Elasticsearch. One of the things I am doing is generating heatmaps with
the geohash grid aggregation. I would like to take this to the extreme
case of gridding down to the individual pixel level to display raster
images of a data set, but I am not concerned with the total doc count of
each geohash. Is there a way (or could it be implemented) where an
optimized aggregation could be run that simply lists the existing terms
(geohashes) and does not bother with aggregating their counts? If this
significantly improved performance, such a feature would be very valuable.

Thanks!

Elliott Bradshaw

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/834ebcb1-43b3-486d-bd1a-952005a6a66d%
40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/834ebcb1-43b3-486d-bd1a-952005a6a66d%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
Adrien Grand

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/d83c0bc5-bac5-4bae-9984-74ffbf6cd8b3%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/d83c0bc5-bac5-4bae-9984-74ffbf6cd8b3%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
Adrien Grand

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/2d55880c-e539-4614-a99e-77d9cede47f1%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Topic		Replies	Views
Doing computation on count aggregation in heatmap Elasticsearch	1	345	December 4, 2020
Term aggregation on all document Elasticsearch	1	237	September 24, 2021
Aggregations without count Elasticsearch	2	2050	July 5, 2017
Trying to fetch document ids with a geohash_grid aggregation Elasticsearch	2	419	July 6, 2017
GeoTile Aggregation Elasticsearch	10	1558	April 30, 2020

"Aggregations" without doc-counts

Related topics