Is ElasticSearch truly scalable for analytics?

Yifan_Wang · December 15, 2014, 9:52pm

If I understand correctly, ElasticSearch directly sends query to and
collects aggregated results from each shard. With number of shards
increases, Reduce phase on the Client node will become overwhelmed.

One would assume, if ElasticSearch support node level aggregation, the
"Reduce" becomes distributed so Client node will not become overwhelmed for
large clusters with lots of shards. I am wondering if ElasticSearch
supports node level reduce. If not, why? I think this is critical if we
like ElasticSearch to be truly scalable for analytics.

Thanks.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/a8f56d70-9738-4768-9637-9159e6955cc2%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

jpountz · December 16, 2014, 1:57pm

What you mean with "node level reduce"?

On Mon, Dec 15, 2014 at 10:52 PM, Yifan Wang yifan.wang.usa@gmail.com
wrote:

If I understand correctly, Elasticsearch directly sends query to and
collects aggregated results from each shard. With number of shards
increases, Reduce phase on the Client node will become overwhelmed.

One would assume, if Elasticsearch support node level aggregation, the
"Reduce" becomes distributed so Client node will not become overwhelmed for
large clusters with lots of shards. I am wondering if Elasticsearch
supports node level reduce. If not, why? I think this is critical if we
like Elasticsearch to be truly scalable for analytics.

Thanks.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/a8f56d70-9738-4768-9637-9159e6955cc2%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/a8f56d70-9738-4768-9637-9159e6955cc2%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
Adrien Grand

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAL6Z4j4QddJBSNc-LTL_K9-e9GmawBKXPiWgFJo7kp9uL2ii%3DA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

elvarb · December 16, 2014, 6:31pm

Elasticsearch supports tribe nodes, so you can combine multiple clusters,
you then query the tribe node to access data on all of them.

On Monday, December 15, 2014 9:52:45 PM UTC, Yifan Wang wrote:

If I understand correctly, Elasticsearch directly sends query to and
collects aggregated results from each shard. With number of shards
increases, Reduce phase on the Client node will become overwhelmed.

One would assume, if Elasticsearch support node level aggregation, the
"Reduce" becomes distributed so Client node will not become overwhelmed for
large clusters with lots of shards. I am wondering if Elasticsearch
supports node level reduce. If not, why? I think this is critical if we
like Elasticsearch to be truly scalable for analytics.

Thanks.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/baffc6eb-2f28-4498-adf7-8b9628da7d0d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

roytmana · December 16, 2014, 6:31pm

ES already doing aggregations on each node. it is not like it is shipping row level query data back to master for aggregation.
In fact, one unpleasant effect of it is that aggregation results are not guaranteed to be precise due to distributed nature of the aggregation for multibucket aggs ordered by count such as terms

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/4a9aaac6-7273-44e6-be5e-9403e12a5249%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Yifan_Wang · December 17, 2014, 9:06pm

"node" is referring to "individual data node". Currently "Reduce" is only
done once on the "Client Node", not on each individual data node.

I am just wondering how scalable it is for analytics with current
architecture. I would like to hear if anyone had any experience.

On Tuesday, December 16, 2014 8:57:28 AM UTC-5, Adrien Grand wrote:

What you mean with "node level reduce"?

On Mon, Dec 15, 2014 at 10:52 PM, Yifan Wang <yifan.w...@gmail.com
<javascript:>> wrote:

If I understand correctly, Elasticsearch directly sends query to and
collects aggregated results from each shard. With number of shards
increases, Reduce phase on the Client node will become overwhelmed.

One would assume, if Elasticsearch support node level aggregation, the
"Reduce" becomes distributed so Client node will not become overwhelmed for
large clusters with lots of shards. I am wondering if Elasticsearch
supports node level reduce. If not, why? I think this is critical if we
like Elasticsearch to be truly scalable for analytics.

Thanks.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/a8f56d70-9738-4768-9637-9159e6955cc2%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/a8f56d70-9738-4768-9637-9159e6955cc2%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
Adrien Grand

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/575e1709-e421-4a80-9db2-17923bfcb256%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Yifan_Wang · December 17, 2014, 9:09pm

How the ranking will work across clusters?

On Tuesday, December 16, 2014 1:31:03 PM UTC-5, Elvar Böðvarsson wrote:

Elasticsearch supports tribe nodes, so you can combine multiple clusters,
you then query the tribe node to access data on all of them.

On Monday, December 15, 2014 9:52:45 PM UTC, Yifan Wang wrote:

If I understand correctly, Elasticsearch directly sends query to and
collects aggregated results from each shard. With number of shards
increases, Reduce phase on the Client node will become overwhelmed.

One would assume, if Elasticsearch support node level aggregation, the
"Reduce" becomes distributed so Client node will not become overwhelmed for
large clusters with lots of shards. I am wondering if Elasticsearch
supports node level reduce. If not, why? I think this is critical if we
like Elasticsearch to be truly scalable for analytics.

Thanks.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/a8ee2573-705a-4152-bd15-b2eaeeb2f11e%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Yifan_Wang · December 17, 2014, 9:12pm

I thought ES only "Collect" on individual shards, and "Reduce" on Client
Node (master if you call it), nothing is done at the data node level.

On Tuesday, December 16, 2014 1:31:30 PM UTC-5, AlexR wrote:

ES already doing aggregations on each node. it is not like it is shipping
row level query data back to master for aggregation.
In fact, one unpleasant effect of it is that aggregation results are not
guaranteed to be precise due to distributed nature of the aggregation for
multibucket aggs ordered by count such as terms

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/6a92e563-f0af-41c2-8750-78b0bf05ae15%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

roytmana · December 18, 2014, 12:09am

if you take a terms aggregation, the heavy lifting of the aggregation is
done on each node then aggregated results are combined on the master node.
So if you have thousands of nodes and very high cardinality nested aggs the
merging part may become a bottleneck but cost of doing actual aggregation
in most cases is far higher than cost of merging results from reasonable
number of shards. So in practice I think it balances pretty well. Of course
you are not limited to one master to handle concurrent requests

On Wednesday, December 17, 2014 4:12:44 PM UTC-5, Yifan Wang wrote:

I thought ES only "Collect" on individual shards, and "Reduce" on Client
Node (master if you call it), nothing is done at the data node level.

On Tuesday, December 16, 2014 1:31:30 PM UTC-5, AlexR wrote:

ES already doing aggregations on each node. it is not like it is shipping
row level query data back to master for aggregation.
In fact, one unpleasant effect of it is that aggregation results are not
guaranteed to be precise due to distributed nature of the aggregation for
multibucket aggs ordered by count such as terms

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/61122d28-8f62-4ee2-b9e7-6fd99048ee8e%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

jpountz · December 18, 2014, 2:33pm

+1 to what AlexR said. I think there is indeed a bad assumption that shards
just forward data to the coordinating node, this is not the case.

On Thu, Dec 18, 2014 at 1:09 AM, AlexR roytmana@gmail.com wrote:

if you take a terms aggregation, the heavy lifting of the aggregation is
done on each node then aggregated results are combined on the master node.
So if you have thousands of nodes and very high cardinality nested aggs the
merging part may become a bottleneck but cost of doing actual aggregation
in most cases is far higher than cost of merging results from reasonable
number of shards. So in practice I think it balances pretty well. Of course
you are not limited to one master to handle concurrent requests

On Wednesday, December 17, 2014 4:12:44 PM UTC-5, Yifan Wang wrote:

I thought ES only "Collect" on individual shards, and "Reduce" on Client
Node (master if you call it), nothing is done at the data node level.

On Tuesday, December 16, 2014 1:31:30 PM UTC-5, AlexR wrote:

ES already doing aggregations on each node. it is not like it is
shipping row level query data back to master for aggregation.
In fact, one unpleasant effect of it is that aggregation results are not
guaranteed to be precise due to distributed nature of the aggregation for
multibucket aggs ordered by count such as terms

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/61122d28-8f62-4ee2-b9e7-6fd99048ee8e%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/61122d28-8f62-4ee2-b9e7-6fd99048ee8e%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
Adrien Grand

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAL6Z4j5vQV5BsszRgrBMWiUTd9HsnWyTauTMqObCiiaFUHGFjA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Yifan_Wang · December 18, 2014, 3:48pm

Sorry, if I did not make it clear. For sure I know aggregation is done on
the node for each shard, but here is the challenge. Say we set
shard_size=50,000. ES will aggregate on each shard and create buckets for
the matching documents, and then send top 50,000 buckets to the client node
for Reduce. Say we have 50 data nodes, and each node has 32 shards. This
means we need to send 50,000 buckets from each shard to the client node for
final aggregation. First, this may add heavy traffic to the network (what
if we have 100 nodes?). And second, the client will need to aggregate on
received 503250,000 buckets. Would this cause any congestion on the
client node? However if we can aggregate on the node first, meaning reduce
from 32 buckets to only one bucket, then the client node only has to
process 50 buckets. This would significanly reduce the network traffic and
improve the scalability, plus because we can set relatively larger
shard_size, it will improve the accuracy of the final results, which is
another key issue we are facing in distributed environment on aggregations.

So my key question is about the scalability particularly on aggregations.
It seems to be a challenge in my experience. I just want to hear other
people's experience. On heavy analytics applications, this will be a key.

Of course, I also understand, adding node level aggregation may impact the
overall performance. I am wondering if anyone has thought about or done
anything in this aspect.

BTW, I like Elasticsearch, but want to hear from the community on some of
the key challenges.

On Thursday, December 18, 2014 9:34:07 AM UTC-5, Adrien Grand wrote:

+1 to what AlexR said. I think there is indeed a bad assumption that
shards just forward data to the coordinating node, this is not the case.

On Thu, Dec 18, 2014 at 1:09 AM, AlexR <royt...@gmail.com <javascript:>>
wrote:

if you take a terms aggregation, the heavy lifting of the aggregation is
done on each node then aggregated results are combined on the master node.
So if you have thousands of nodes and very high cardinality nested aggs the
merging part may become a bottleneck but cost of doing actual aggregation
in most cases is far higher than cost of merging results from reasonable
number of shards. So in practice I think it balances pretty well. Of course
you are not limited to one master to handle concurrent requests

On Wednesday, December 17, 2014 4:12:44 PM UTC-5, Yifan Wang wrote:

I thought ES only "Collect" on individual shards, and "Reduce" on Client
Node (master if you call it), nothing is done at the data node level.

On Tuesday, December 16, 2014 1:31:30 PM UTC-5, AlexR wrote:

ES already doing aggregations on each node. it is not like it is
shipping row level query data back to master for aggregation.
In fact, one unpleasant effect of it is that aggregation results are
not guaranteed to be precise due to distributed nature of the aggregation
for multibucket aggs ordered by count such as terms

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/61122d28-8f62-4ee2-b9e7-6fd99048ee8e%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/61122d28-8f62-4ee2-b9e7-6fd99048ee8e%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
Adrien Grand

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/3519d982-390a-4a62-9d75-0bb5e1a4654b%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Yifan_Wang · December 18, 2014, 4:02pm

Correction "meaning reduce from 32 buckets to only one bucket, then the
client node only has to process 50 buckets." should read "reduce from 32 of
50K buckets to only 1 being sent to the client node, then the client node
only has to process 50 of 50K buckets".

On Thursday, December 18, 2014 10:48:07 AM UTC-5, Yifan Wang wrote:

Sorry, if I did not make it clear. For sure I know aggregation is done on
the node for each shard, but here is the challenge. Say we set
shard_size=50,000. ES will aggregate on each shard and create buckets for
the matching documents, and then send top 50,000 buckets to the client node
for Reduce. Say we have 50 data nodes, and each node has 32 shards. This
means we need to send 50,000 buckets from each shard to the client node for
final aggregation. First, this may add heavy traffic to the network (what
if we have 100 nodes?). And second, the client will need to aggregate on
received 503250,000 buckets. Would this cause any congestion on the
client node? However if we can aggregate on the node first, meaning reduce
from 32 buckets to only one bucket, then the client node only has to
process 50 buckets. This would significantly reduce the network traffic and
improve the scalability, plus because we can set relatively larger
shard_size, it will improve the accuracy of the final results, which is
another key issue we are facing in distributed environment on aggregations.

So my key question is about the scalability particularly on aggregations.
It seems to be a challenge in my experience. I just want to hear other
people's experience. On heavy analytics applications, this will be a key.

Of course, I also understand, adding node level aggregation may impact the
overall performance. I am wondering if anyone has thought about or done
anything in this aspect.

BTW, I like Elasticsearch, but want to hear from the community on some of
the key challenges.

On Thursday, December 18, 2014 9:34:07 AM UTC-5, Adrien Grand wrote:

+1 to what AlexR said. I think there is indeed a bad assumption that
shards just forward data to the coordinating node, this is not the case.

On Thu, Dec 18, 2014 at 1:09 AM, AlexR royt...@gmail.com wrote:

if you take a terms aggregation, the heavy lifting of the aggregation is
done on each node then aggregated results are combined on the master node.
So if you have thousands of nodes and very high cardinality nested aggs the
merging part may become a bottleneck but cost of doing actual aggregation
in most cases is far higher than cost of merging results from reasonable
number of shards. So in practice I think it balances pretty well. Of course
you are not limited to one master to handle concurrent requests

On Wednesday, December 17, 2014 4:12:44 PM UTC-5, Yifan Wang wrote:

I thought ES only "Collect" on individual shards, and "Reduce" on
Client Node (master if you call it), nothing is done at the data node level.

On Tuesday, December 16, 2014 1:31:30 PM UTC-5, AlexR wrote:

ES already doing aggregations on each node. it is not like it is
shipping row level query data back to master for aggregation.
In fact, one unpleasant effect of it is that aggregation results are
not guaranteed to be precise due to distributed nature of the aggregation
for multibucket aggs ordered by count such as terms

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/61122d28-8f62-4ee2-b9e7-6fd99048ee8e%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/61122d28-8f62-4ee2-b9e7-6fd99048ee8e%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
Adrien Grand

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/7ae50080-9e09-48df-bd50-552e74773c1d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

nik9000 · December 18, 2014, 4:25pm

I think aggregating 32 shards on one node is a bit degenerate. I imagine
its more typical to aggregate across one of two shards per node. Don't get
me wrong, you can totally have nodes store and query ~100 shards each
without much trouble. If aggregating across a bunch of shards per node
were a common thing I think a node level reduce step might help. I'm
certainly no expert in the reduce code though.

Nik

On Thu, Dec 18, 2014 at 10:48 AM, Yifan Wang yifan.wang.usa@gmail.com
wrote:

Sorry, if I did not make it clear. For sure I know aggregation is done on
the node for each shard, but here is the challenge. Say we set
shard_size=50,000. ES will aggregate on each shard and create buckets for
the matching documents, and then send top 50,000 buckets to the client node
for Reduce. Say we have 50 data nodes, and each node has 32 shards. This
means we need to send 50,000 buckets from each shard to the client node for
final aggregation. First, this may add heavy traffic to the network (what
if we have 100 nodes?). And second, the client will need to aggregate on
received 503250,000 buckets. Would this cause any congestion on the
client node? However if we can aggregate on the node first, meaning reduce
from 32 buckets to only one bucket, then the client node only has to
process 50 buckets. This would significanly reduce the network traffic and
improve the scalability, plus because we can set relatively larger
shard_size, it will improve the accuracy of the final results, which is
another key issue we are facing in distributed environment on aggregations.

So my key question is about the scalability particularly on aggregations.
It seems to be a challenge in my experience. I just want to hear other
people's experience. On heavy analytics applications, this will be a key.

Of course, I also understand, adding node level aggregation may impact the
overall performance. I am wondering if anyone has thought about or done
anything in this aspect.

BTW, I like Elasticsearch, but want to hear from the community on some of
the key challenges.

On Thursday, December 18, 2014 9:34:07 AM UTC-5, Adrien Grand wrote:

+1 to what AlexR said. I think there is indeed a bad assumption that
shards just forward data to the coordinating node, this is not the case.

On Thu, Dec 18, 2014 at 1:09 AM, AlexR royt...@gmail.com wrote:

if you take a terms aggregation, the heavy lifting of the aggregation is
done on each node then aggregated results are combined on the master node.
So if you have thousands of nodes and very high cardinality nested aggs the
merging part may become a bottleneck but cost of doing actual aggregation
in most cases is far higher than cost of merging results from reasonable
number of shards. So in practice I think it balances pretty well. Of course
you are not limited to one master to handle concurrent requests

On Wednesday, December 17, 2014 4:12:44 PM UTC-5, Yifan Wang wrote:

I thought ES only "Collect" on individual shards, and "Reduce" on
Client Node (master if you call it), nothing is done at the data node level.

On Tuesday, December 16, 2014 1:31:30 PM UTC-5, AlexR wrote:

ES already doing aggregations on each node. it is not like it is
shipping row level query data back to master for aggregation.
In fact, one unpleasant effect of it is that aggregation results are
not guaranteed to be precise due to distributed nature of the aggregation
for multibucket aggs ordered by count such as terms

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/61122d28-8f62-4ee2-b9e7-6fd99048ee8e%
40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/61122d28-8f62-4ee2-b9e7-6fd99048ee8e%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
Adrien Grand

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/3519d982-390a-4a62-9d75-0bb5e1a4654b%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/3519d982-390a-4a62-9d75-0bb5e1a4654b%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAPmjWd1J0UfSzvAX07vo9HXxBOJYnzD4%3DBwfd9-JkiEJx43StQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

roytmana · December 18, 2014, 6:22pm

Nick,

I am not an expert in this area either but with multi-core processors (24,
32, 48) it is not uncommon to have fairly large number of shards on a node
so 30 shards is not out of question
I assumed that ES aggregate shard results on a node prior to shipping them
to a master but I do not know if it is true. It may very well be that node
sends per shard aggregations to the master which case it it 32xShard
ResultSize for our 32 shard node. reducing size of network packet by 32
(even if it were just 8) and work for master by the same ratio is not a
chump change. Somehow I think ES already doing it but who knows

Another potential benefit of doing node aggregation is that on a single
node when aggregating multiple shards ES could resolve potential errors by
aggregating all buckets and re-calculating buckets not present in every
shard at a fairly low cost while doing so across nodes is costly. On the
other hand it may amplify the error across nodes do not know

On Thursday, December 18, 2014 11:26:37 AM UTC-5, Nikolas Everett wrote:

I think aggregating 32 shards on one node is a bit degenerate. I imagine
its more typical to aggregate across one of two shards per node. Don't get
me wrong, you can totally have nodes store and query ~100 shards each
without much trouble. If aggregating across a bunch of shards per node
were a common thing I think a node level reduce step might help. I'm
certainly no expert in the reduce code though.

Nik

On Thu, Dec 18, 2014 at 10:48 AM, Yifan Wang <yifan.w...@gmail.com
<javascript:>> wrote:

Sorry, if I did not make it clear. For sure I know aggregation is done on
the node for each shard, but here is the challenge. Say we set
shard_size=50,000. ES will aggregate on each shard and create buckets for
the matching documents, and then send top 50,000 buckets to the client node
for Reduce. Say we have 50 data nodes, and each node has 32 shards. This
means we need to send 50,000 buckets from each shard to the client node for
final aggregation. First, this may add heavy traffic to the network (what
if we have 100 nodes?). And second, the client will need to aggregate on
received 503250,000 buckets. Would this cause any congestion on the
client node? However if we can aggregate on the node first, meaning reduce
from 32 buckets to only one bucket, then the client node only has to
process 50 buckets. This would significanly reduce the network traffic and
improve the scalability, plus because we can set relatively larger
shard_size, it will improve the accuracy of the final results, which is
another key issue we are facing in distributed environment on aggregations.

So my key question is about the scalability particularly on aggregations.
It seems to be a challenge in my experience. I just want to hear other
people's experience. On heavy analytics applications, this will be a key.

Of course, I also understand, adding node level aggregation may impact
the overall performance. I am wondering if anyone has thought about or done
anything in this aspect.

BTW, I like Elasticsearch, but want to hear from the community on some of
the key challenges.

On Thursday, December 18, 2014 9:34:07 AM UTC-5, Adrien Grand wrote:

+1 to what AlexR said. I think there is indeed a bad assumption that
shards just forward data to the coordinating node, this is not the case.

On Thu, Dec 18, 2014 at 1:09 AM, AlexR royt...@gmail.com wrote:

if you take a terms aggregation, the heavy lifting of the aggregation
is done on each node then aggregated results are combined on the master
node. So if you have thousands of nodes and very high cardinality nested
aggs the merging part may become a bottleneck but cost of doing actual
aggregation in most cases is far higher than cost of merging results from
reasonable number of shards. So in practice I think it balances pretty
well. Of course you are not limited to one master to handle concurrent
requests

On Wednesday, December 17, 2014 4:12:44 PM UTC-5, Yifan Wang wrote:

I thought ES only "Collect" on individual shards, and "Reduce" on
Client Node (master if you call it), nothing is done at the data node level.

On Tuesday, December 16, 2014 1:31:30 PM UTC-5, AlexR wrote:

ES already doing aggregations on each node. it is not like it is
shipping row level query data back to master for aggregation.
In fact, one unpleasant effect of it is that aggregation results are
not guaranteed to be precise due to distributed nature of the aggregation
for multibucket aggs ordered by count such as terms

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/61122d28-8f62-4ee2-b9e7-6fd99048ee8e%
40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/61122d28-8f62-4ee2-b9e7-6fd99048ee8e%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
Adrien Grand

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/3519d982-390a-4a62-9d75-0bb5e1a4654b%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/3519d982-390a-4a62-9d75-0bb5e1a4654b%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/3d289a72-0d2b-45ca-a77a-f61a7d70c5b6%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

jprante · December 19, 2014, 9:09am

A node does not send shard aggregations to the master, but to the client
node.

The basic idea of sharding in Elasticsearch is that shards spread over all
the nodes, and the shard count matches or comes close to the maximum number
of nodes. The shard distribution should be undistorted, that means, all
shards should be equal in size, volume, terms distribution etc. So in the
general case, a node has to process just one shard for aggregation, or
better, all nodes do equivalent work in the aggregation process.

There is not much gain in implementing extra intermediate aggregation stage
per node only because some users put more than one shard per index on a
node und configure weighted indices where some nodes have more shards than
others. Instead, adding more nodes is the best method to achieve better
scalability for this index, or creating more indices on more nodes, and
combining them with index aliases.

Jörg

On Thu, Dec 18, 2014 at 7:22 PM, AlexR roytmana@gmail.com wrote:

Nick,

I am not an expert in this area either but with multi-core processors (24,
32, 48) it is not uncommon to have fairly large number of shards on a node
so 30 shards is not out of question
I assumed that ES aggregate shard results on a node prior to shipping them
to a master but I do not know if it is true. It may very well be that node
sends per shard aggregations to the master which case it it 32xShard
ResultSize for our 32 shard node. reducing size of network packet by 32
(even if it were just 8) and work for master by the same ratio is not a
chump change. Somehow I think ES already doing it but who knows

Another potential benefit of doing node aggregation is that on a single
node when aggregating multiple shards ES could resolve potential errors by
aggregating all buckets and re-calculating buckets not present in every
shard at a fairly low cost while doing so across nodes is costly. On the
other hand it may amplify the error across nodes do not know

On Thursday, December 18, 2014 11:26:37 AM UTC-5, Nikolas Everett wrote:

I think aggregating 32 shards on one node is a bit degenerate. I imagine
its more typical to aggregate across one of two shards per node. Don't get
me wrong, you can totally have nodes store and query ~100 shards each
without much trouble. If aggregating across a bunch of shards per node
were a common thing I think a node level reduce step might help. I'm
certainly no expert in the reduce code though.

Nik

On Thu, Dec 18, 2014 at 10:48 AM, Yifan Wang yifan.w...@gmail.com
wrote:

Sorry, if I did not make it clear. For sure I know aggregation is done
on the node for each shard, but here is the challenge. Say we set
shard_size=50,000. ES will aggregate on each shard and create buckets for
the matching documents, and then send top 50,000 buckets to the client node
for Reduce. Say we have 50 data nodes, and each node has 32 shards. This
means we need to send 50,000 buckets from each shard to the client node for
final aggregation. First, this may add heavy traffic to the network (what
if we have 100 nodes?). And second, the client will need to aggregate on
received 503250,000 buckets. Would this cause any congestion on the
client node? However if we can aggregate on the node first, meaning reduce
from 32 buckets to only one bucket, then the client node only has to
process 50 buckets. This would significanly reduce the network traffic and
improve the scalability, plus because we can set relatively larger
shard_size, it will improve the accuracy of the final results, which is
another key issue we are facing in distributed environment on aggregations.

So my key question is about the scalability particularly on
aggregations. It seems to be a challenge in my experience. I just want to
hear other people's experience. On heavy analytics applications, this will
be a key.

Of course, I also understand, adding node level aggregation may impact
the overall performance. I am wondering if anyone has thought about or done
anything in this aspect.

BTW, I like Elasticsearch, but want to hear from the community on some
of the key challenges.

On Thursday, December 18, 2014 9:34:07 AM UTC-5, Adrien Grand wrote:

+1 to what AlexR said. I think there is indeed a bad assumption that
shards just forward data to the coordinating node, this is not the case.

On Thu, Dec 18, 2014 at 1:09 AM, AlexR royt...@gmail.com wrote:

if you take a terms aggregation, the heavy lifting of the aggregation
is done on each node then aggregated results are combined on the master
node. So if you have thousands of nodes and very high cardinality nested
aggs the merging part may become a bottleneck but cost of doing actual
aggregation in most cases is far higher than cost of merging results from
reasonable number of shards. So in practice I think it balances pretty
well. Of course you are not limited to one master to handle concurrent
requests

On Wednesday, December 17, 2014 4:12:44 PM UTC-5, Yifan Wang wrote:

I thought ES only "Collect" on individual shards, and "Reduce" on
Client Node (master if you call it), nothing is done at the data node level.

On Tuesday, December 16, 2014 1:31:30 PM UTC-5, AlexR wrote:

ES already doing aggregations on each node. it is not like it is
shipping row level query data back to master for aggregation.
In fact, one unpleasant effect of it is that aggregation results are
not guaranteed to be precise due to distributed nature of the aggregation
for multibucket aggs ordered by count such as terms

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/61122d28-8f62-4ee2-b9e7-6fd99048ee8e%40goo
glegroups.com
https://groups.google.com/d/msgid/elasticsearch/61122d28-8f62-4ee2-b9e7-6fd99048ee8e%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
Adrien Grand

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/3519d982-390a-4a62-9d75-0bb5e1a4654b%
40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/3519d982-390a-4a62-9d75-0bb5e1a4654b%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/3d289a72-0d2b-45ca-a77a-f61a7d70c5b6%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/3d289a72-0d2b-45ca-a77a-f61a7d70c5b6%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoH9Tf28%3DBSVEqWPDKOtswD6xERiyB4pjfCXKeePYbBVnQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

roytmana · December 19, 2014, 2:42pm

Jorg, if you have a single large index and a cluster with 3 nodes do you suggest to create just 3 shards even though each node has say 16 cores. With just three shards they will be very big and not much patallelism in computations will occur.
am I missing something?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/36d1a61a-e996-4bec-97b7-0842fc118cb2%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

jprante · December 19, 2014, 3:31pm

Yes, I have 3 nodes and each index has 3 shards, on 32 core machines.

Each shard contains many segments, which can be read and written
concurrently by Lucene. Since Lucene 4, there have been massive
improvements in that area.

Maybe you have observed the effect that many shards on a node for a single
index show a different performance behavior when docs are added over long
periods of time. It simply takes longer before large segment merging begins
because docs are wider distributed and use smaller segment sizes for a
longer time. The downside is that huge segment counts may occur (and many
users encounter high file descriptor numbers). With the right
configuration, you can set up a single shard per index on a node, and
segment merging / segment count is not a real problem.

You are right if you consider shard size as a factor for moving the shard
around (into snapshot/restore) or for export, or at node recovery when the
node starts up. I think shard sizes over 30 GB are a bit heavy, but this
also depends on the speed of the I/O subsystem. With SSD or RAID 0, I can
operate at I/O rates of over 1 GB/sec at sequential read. The shard size
factor has to be balanced out, either by using more than one index, or a
higher number of nodes, or faster I/O subsystem.

Jörg

On Fri, Dec 19, 2014 at 3:42 PM, AlexR roytmana@gmail.com wrote:

Jorg, if you have a single large index and a cluster with 3 nodes do you
suggest to create just 3 shards even though each node has say 16 cores.
With just three shards they will be very big and not much patallelism in
computations will occur.
am I missing something?

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/36d1a61a-e996-4bec-97b7-0842fc118cb2%40googlegroups.com
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoG%2BK%2BuoGzViR%2B-TjYi97zYHVmkuqxT6eWXDu9xiek-NNQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

chuck · January 14, 2015, 3:16pm

Just out of curiosity, are aggregations on multiple shards on a single node
executed serially or in parallel? In my experience, it appears that
they're executed serially (my CPU usage did not change when going from 1
shard to 2 shards per node, but I didn't test this extensively). I'm
interested in maximizing the parallelism of an aggregation without creating
a massive number of nodes.

On Friday, December 19, 2014 at 10:31:45 AM UTC-5, Jörg Prante wrote:

Yes, I have 3 nodes and each index has 3 shards, on 32 core machines.

Each shard contains many segments, which can be read and written
concurrently by Lucene. Since Lucene 4, there have been massive
improvements in that area.

Maybe you have observed the effect that many shards on a node for a single
index show a different performance behavior when docs are added over long
periods of time. It simply takes longer before large segment merging begins
because docs are wider distributed and use smaller segment sizes for a
longer time. The downside is that huge segment counts may occur (and many
users encounter high file descriptor numbers). With the right
configuration, you can set up a single shard per index on a node, and
segment merging / segment count is not a real problem.

You are right if you consider shard size as a factor for moving the shard
around (into snapshot/restore) or for export, or at node recovery when the
node starts up. I think shard sizes over 30 GB are a bit heavy, but this
also depends on the speed of the I/O subsystem. With SSD or RAID 0, I can
operate at I/O rates of over 1 GB/sec at sequential read. The shard size
factor has to be balanced out, either by using more than one index, or a
higher number of nodes, or faster I/O subsystem.

Jörg

On Fri, Dec 19, 2014 at 3:42 PM, AlexR <royt...@gmail.com <javascript:>>
wrote:

Jorg, if you have a single large index and a cluster with 3 nodes do you
suggest to create just 3 shards even though each node has say 16 cores.
With just three shards they will be very big and not much patallelism in
computations will occur.
am I missing something?

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/36d1a61a-e996-4bec-97b7-0842fc118cb2%40googlegroups.com
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/813098da-f6e8-42ae-b162-7ac551f4be18%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

jpountz · January 14, 2015, 3:55pm

On Wed, Jan 14, 2015 at 4:16 PM, Elliott Bradshaw ebradshaw1@gmail.com
wrote:

Just out of curiosity, are aggregations on multiple shards on a single
node executed serially or in parallel? In my experience, it appears that
they're executed serially (my CPU usage did not change when going from 1
shard to 2 shards per node, but I didn't test this extensively). I'm
interested in maximizing the parallelism of an aggregation without creating
a massive number of nodes.

Requests are processed serially per shard, but several shards can be
processed at the same time. So if you have an index that consists of N
primaries, this would run on N processors of your cluster in parallel.

--
Adrien Grand

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAL6Z4j5-YvCkLzK15NpmfMrjVCE4RGV85WOPC%2Bo7MrWobCjwbA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

chuck · January 14, 2015, 5:44pm

Adrien,

I get the feeling that you're a pretty heavy contributor to the aggregation
module. In your experience, would a shard per cpu core strategy be an
effective performance solution in a pure aggregation use case? If this
could proportionally reduce the aggregation time, would a node local reduce
(in which all shard aggregations on a given node are reduced prior to being
sent to the client node) be a good follow on strategy for further
enhancement?

On Wednesday, January 14, 2015 at 10:56:03 AM UTC-5, Adrien Grand wrote:

On Wed, Jan 14, 2015 at 4:16 PM, Elliott Bradshaw <ebrad...@gmail.com
<javascript:>> wrote:

Just out of curiosity, are aggregations on multiple shards on a single
node executed serially or in parallel? In my experience, it appears that
they're executed serially (my CPU usage did not change when going from 1
shard to 2 shards per node, but I didn't test this extensively). I'm
interested in maximizing the parallelism of an aggregation without creating
a massive number of nodes.

Requests are processed serially per shard, but several shards can be
processed at the same time. So if you have an index that consists of N
primaries, this would run on N processors of your cluster in parallel.

--
Adrien Grand

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/03f2db56-8c37-4f10-9383-7e8591aed865%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Mark_Harwood_2 · January 14, 2015, 6:09pm

If you introduce an extra reduction phase (for multiple shards on the same
node) you introduce further potential for inaccuracies in the final results.
Consider the role of 'size' and 'shard_size' in the "terms" aggregation [1]
and the effects they have on accuracy. You'd arguably need a 'node_size'
setting to also control the size of this new intermediate collection. All
stages that reduce the volumes of data processed can introduce an
approximation with the potential for inaccuracies upstream when merging.

[1] Elasticsearch Platform — Find real-time answers at scale | Elastic

On Wednesday, January 14, 2015 at 5:44:47 PM UTC, Elliott Bradshaw wrote:

Adrien,

I get the feeling that you're a pretty heavy contributor to the
aggregation module. In your experience, would a shard per cpu core
strategy be an effective performance solution in a pure aggregation use
case? If this could proportionally reduce the aggregation time, would a
node local reduce (in which all shard aggregations on a given node are
reduced prior to being sent to the client node) be a good follow on
strategy for further enhancement?

On Wednesday, January 14, 2015 at 10:56:03 AM UTC-5, Adrien Grand wrote:

On Wed, Jan 14, 2015 at 4:16 PM, Elliott Bradshaw ebrad...@gmail.com
wrote:

Just out of curiosity, are aggregations on multiple shards on a single
node executed serially or in parallel? In my experience, it appears that
they're executed serially (my CPU usage did not change when going from 1
shard to 2 shards per node, but I didn't test this extensively). I'm
interested in maximizing the parallelism of an aggregation without creating
a massive number of nodes.

Requests are processed serially per shard, but several shards can be
processed at the same time. So if you have an index that consists of N
primaries, this would run on N processors of your cluster in parallel.

--
Adrien Grand

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/fa822b2f-97f9-423a-8e35-7963c53c34f9%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.