Is ElasticSearch truly scalable for analytics?

Mark,

Understood, but what about cases where size is set to unlimited?
Inaccuracies are not a concern in that case, correct?

On Wednesday, January 14, 2015 at 1:09:48 PM UTC-5, Mark Harwood wrote:

If you introduce an extra reduction phase (for multiple shards on the same
node) you introduce further potential for inaccuracies in the final results.
Consider the role of 'size' and 'shard_size' in the "terms" aggregation
[1] and the effects they have on accuracy. You'd arguably need a
'node_size' setting to also control the size of this new intermediate
collection. All stages that reduce the volumes of data processed can
introduce an approximation with the potential for inaccuracies upstream
when merging.

[1]
Elasticsearch Platform — Find real-time answers at scale | Elastic

On Wednesday, January 14, 2015 at 5:44:47 PM UTC, Elliott Bradshaw wrote:

Adrien,

I get the feeling that you're a pretty heavy contributor to the
aggregation module. In your experience, would a shard per cpu core
strategy be an effective performance solution in a pure aggregation use
case? If this could proportionally reduce the aggregation time, would a
node local reduce (in which all shard aggregations on a given node are
reduced prior to being sent to the client node) be a good follow on
strategy for further enhancement?

On Wednesday, January 14, 2015 at 10:56:03 AM UTC-5, Adrien Grand wrote:

On Wed, Jan 14, 2015 at 4:16 PM, Elliott Bradshaw ebrad...@gmail.com
wrote:

Just out of curiosity, are aggregations on multiple shards on a single
node executed serially or in parallel? In my experience, it appears that
they're executed serially (my CPU usage did not change when going from 1
shard to 2 shards per node, but I didn't test this extensively). I'm
interested in maximizing the parallelism of an aggregation without creating
a massive number of nodes.

Requests are processed serially per shard, but several shards can be
processed at the same time. So if you have an index that consists of N
primaries, this would run on N processors of your cluster in parallel.

--
Adrien Grand

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/e25d6807-627e-48b7-b5fd-c33d9a094194%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Understood, but what about cases where size is set to unlimited?
Inaccuracies are not a concern in that case, correct?

Correct. But if we only consider the scenarios where the key sets are
complete and accuracy is not put at risk by merging (i.e. there is no "top
N" type filtering in play), how many of these sorts of use cases generate
sufficiently large trees of results where a node-level merging would be
beneficial?

On Wednesday, January 14, 2015 at 1:09:48 PM UTC-5, Mark Harwood wrote:

If you introduce an extra reduction phase (for multiple shards on the
same node) you introduce further potential for inaccuracies in the final
results.
Consider the role of 'size' and 'shard_size' in the "terms" aggregation
[1] and the effects they have on accuracy. You'd arguably need a
'node_size' setting to also control the size of this new intermediate
collection. All stages that reduce the volumes of data processed can
introduce an approximation with the potential for inaccuracies upstream
when merging.

[1]
Elasticsearch Platform — Find real-time answers at scale | Elastic

On Wednesday, January 14, 2015 at 5:44:47 PM UTC, Elliott Bradshaw wrote:

Adrien,

I get the feeling that you're a pretty heavy contributor to the
aggregation module. In your experience, would a shard per cpu core
strategy be an effective performance solution in a pure aggregation use
case? If this could proportionally reduce the aggregation time, would a
node local reduce (in which all shard aggregations on a given node are
reduced prior to being sent to the client node) be a good follow on
strategy for further enhancement?

On Wednesday, January 14, 2015 at 10:56:03 AM UTC-5, Adrien Grand wrote:

On Wed, Jan 14, 2015 at 4:16 PM, Elliott Bradshaw ebrad...@gmail.com
wrote:

Just out of curiosity, are aggregations on multiple shards on a single
node executed serially or in parallel? In my experience, it appears that
they're executed serially (my CPU usage did not change when going from 1
shard to 2 shards per node, but I didn't test this extensively). I'm
interested in maximizing the parallelism of an aggregation without creating
a massive number of nodes.

Requests are processed serially per shard, but several shards can be
processed at the same time. So if you have an index that consists of N
primaries, this would run on N processors of your cluster in parallel.

--
Adrien Grand

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/1067c158-2902-4530-8238-d4ec92cde992%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

How hard would it to be to implement such a feature? Even if there are
only a handful of use cases, it could prove very helpful in these.
Particularly since very large trees are the ones that will struggle the
most with bandwidth issues.

On Wednesday, January 14, 2015 at 1:36:53 PM UTC-5, Mark Harwood wrote:

Understood, but what about cases where size is set to unlimited?

Inaccuracies are not a concern in that case, correct?

Correct. But if we only consider the scenarios where the key sets are
complete and accuracy is not put at risk by merging (i.e. there is no "top
N" type filtering in play), how many of these sorts of use cases generate
sufficiently large trees of results where a node-level merging would be
beneficial?

On Wednesday, January 14, 2015 at 1:09:48 PM UTC-5, Mark Harwood wrote:

If you introduce an extra reduction phase (for multiple shards on the
same node) you introduce further potential for inaccuracies in the final
results.
Consider the role of 'size' and 'shard_size' in the "terms" aggregation
[1] and the effects they have on accuracy. You'd arguably need a
'node_size' setting to also control the size of this new intermediate
collection. All stages that reduce the volumes of data processed can
introduce an approximation with the potential for inaccuracies upstream
when merging.

[1]
Elasticsearch Platform — Find real-time answers at scale | Elastic

On Wednesday, January 14, 2015 at 5:44:47 PM UTC, Elliott Bradshaw wrote:

Adrien,

I get the feeling that you're a pretty heavy contributor to the
aggregation module. In your experience, would a shard per cpu core
strategy be an effective performance solution in a pure aggregation use
case? If this could proportionally reduce the aggregation time, would a
node local reduce (in which all shard aggregations on a given node are
reduced prior to being sent to the client node) be a good follow on
strategy for further enhancement?

On Wednesday, January 14, 2015 at 10:56:03 AM UTC-5, Adrien Grand wrote:

On Wed, Jan 14, 2015 at 4:16 PM, Elliott Bradshaw ebrad...@gmail.com
wrote:

Just out of curiosity, are aggregations on multiple shards on a
single node executed serially or in parallel? In my experience, it appears
that they're executed serially (my CPU usage did not change when going from
1 shard to 2 shards per node, but I didn't test this extensively). I'm
interested in maximizing the parallelism of an aggregation without creating
a massive number of nodes.

Requests are processed serially per shard, but several shards can be
processed at the same time. So if you have an index that consists of N
primaries, this would run on N processors of your cluster in parallel.

--
Adrien Grand

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/3e2e6c82-3316-4eea-9efd-ee466e79d68e%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Adding a 'node reduce phase' to aggregations is something I'm very
interested in, and also investigating for the project I'm currently working
on.

"If you introduce an extra reduction phase (for multiple shards on the same
node) you introduce further potential for inaccuracies in the final
results."

This is true if you only reduce the top-k items per shard, but I was
thinking to reduce the complete set of buckets locally. This takes a bit
more cpu, and memory, but my guess is that this is negligible compared to
the work already being done by the aggregation framework. If you reduce the
buckets on the node before sending it to the coordinator it will actually
increase the accuracy for aggregations!

"how many of these sorts of use cases generate sufficiently large trees of
results where a node-level merging would be beneficial"

It is primarily beneficial for bigger installations with lots of shards per
machine. Say 40 machines with ~100 shards per machine. In the current
strategy where every node is sending 100 results there is a lot of
bandwidth used on the coordinating node, since it receives 4000 responses,
while it could do with 40 responses (1 per machine).

I acknowledge it is a highly specialised use-case which not very many
people run into, but it is a case I'm currently working on.

"How hard would it to be to implement such a feature?"

I have been looking into this, and it is not trivial. This needs to be
implemented in/around the SearchService. This is the place I found to be
implementing the different search strategies, eg. DFS. Unlike the rest of
Elasticsearch it does seem to not consist of modules that implement
different search strategies.

Regarding the accuracy of top-k lists. I think the above, both the 'node
reduce phase' and making the search strategy pluggable will be the
groundwork to start working on implementations of TJA or TPUT strategies as
discussed in an old issue[1] about accuracy of factes.

The order of steps to take before reaching the ultimate goal would be:

  1. Make search strategies (eg. query then fetch, dfs query then fetch) more
    modularized.
  2. Make a search strategy with a 'node reduce phase' for the aggregations.
    Start with a complete reduce on the node. If that takes to much memory/time
    you can use TJA or TPUT locally on the node to get a reliable top-k list.
    3a) Make a search strategy that executes TJA on the cluster coordinated by
    the coordinating node
    3b) Make a separate strategy that executes TPUT on the cluster coordinated
    by the coordinating node

I would say that 3a and 3b are 'easy' if doing a complete reduce in step 2
is not consuming to much resources.

Adding strategies for both TJA and TPUT gives ultimate control to the user,
as TPUT is not suited for reliably sorting on sums where the field might
contain a negative value. But TPUT has better performance in latency over
TJA.

I would love to get an opinion from Adrien concerning the feasibility of
such an approach.

-- Nils

[1] https://github.com/elasticsearch/elasticsearch/issues/1305

On Wednesday, January 14, 2015 at 7:47:07 PM UTC+1, Elliott Bradshaw wrote:

How hard would it to be to implement such a feature? Even if there are
only a handful of use cases, it could prove very helpful in these.
Particularly since very large trees are the ones that will struggle the
most with bandwidth issues.

On Wednesday, January 14, 2015 at 1:36:53 PM UTC-5, Mark Harwood wrote:

Understood, but what about cases where size is set to unlimited?

Inaccuracies are not a concern in that case, correct?

Correct. But if we only consider the scenarios where the key sets are
complete and accuracy is not put at risk by merging (i.e. there is no "top
N" type filtering in play), how many of these sorts of use cases generate
sufficiently large trees of results where a node-level merging would be
beneficial?

On Wednesday, January 14, 2015 at 1:09:48 PM UTC-5, Mark Harwood wrote:

If you introduce an extra reduction phase (for multiple shards on the
same node) you introduce further potential for inaccuracies in the final
results.
Consider the role of 'size' and 'shard_size' in the "terms" aggregation
[1] and the effects they have on accuracy. You'd arguably need a
'node_size' setting to also control the size of this new intermediate
collection. All stages that reduce the volumes of data processed can
introduce an approximation with the potential for inaccuracies upstream
when merging.

[1]
Elasticsearch Platform — Find real-time answers at scale | Elastic

On Wednesday, January 14, 2015 at 5:44:47 PM UTC, Elliott Bradshaw
wrote:

Adrien,

I get the feeling that you're a pretty heavy contributor to the
aggregation module. In your experience, would a shard per cpu core
strategy be an effective performance solution in a pure aggregation use
case? If this could proportionally reduce the aggregation time, would a
node local reduce (in which all shard aggregations on a given node are
reduced prior to being sent to the client node) be a good follow on
strategy for further enhancement?

On Wednesday, January 14, 2015 at 10:56:03 AM UTC-5, Adrien Grand
wrote:

On Wed, Jan 14, 2015 at 4:16 PM, Elliott Bradshaw <ebrad...@gmail.com

wrote:

Just out of curiosity, are aggregations on multiple shards on a
single node executed serially or in parallel? In my experience, it appears
that they're executed serially (my CPU usage did not change when going from
1 shard to 2 shards per node, but I didn't test this extensively). I'm
interested in maximizing the parallelism of an aggregation without creating
a massive number of nodes.

Requests are processed serially per shard, but several shards can be
processed at the same time. So if you have an index that consists of N
primaries, this would run on N processors of your cluster in parallel.

--
Adrien Grand

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/f28febf8-c097-4301-8a4e-315ff1d00b92%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

I would be also very interested in node level shard results reduction but not for scalability but precision reasons. I would like to have an option for a node to do complete aggregations on its shards so the results are exact rather than approximate. There are many use cases when corpus of data is reltively small to fit one powerful node and exactness is a MUST. With 48 core servers and ssd drives such node can process good deal of data and produce exact results which is a must for traditional datamart-like apps. Having this option will allow for this class of apps to be built. And in myltinode setup it wull provide better precision too

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/d3fb8f8d-4563-4e97-b0fd-3cc220f252bc%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Regarding the accuracy of top-k lists....

This is perhaps an over-simplification - we deal with far more complex
scenarios than a simple, single top-K list - we have whole aggregation
trees with multiple layers of aggs: geo, time, nested, parent/child,
percentiles, cardinalities etc etc which can embed multiple top K terms
aggs, or be contained by one. Today all aggs work in one pass over local
data to produce a merge-able summary output - if you introduce the idea of
pausing all of this local computation mid-stream and then resuming it once
you've centrally determined what "top K" is across a cluster and for
various points in the agg tree then coordinating all of these updates gets
impossibly complex.

I acknowledge it is a highly specialised use-case which not very many
people run into, but it is a case I'm currently working on.

To be fair multi-level merging is a capability which might also apply to
analytics in federated architectures where proxy servers might act as the
front to nodes in remote clusters.

I was thinking to reduce the complete set of buckets locally

I'm unclear on your approach to the "reduce":

  1. Take the summary outputs of multiple agg pipelines computed in parallel
    and merge them in the same way coordinating nodes do or
  2. Take the raw inputs (doc streams) from all shards held on a node and
    feed them through a single aggregation pipeline to get one combined output

The problems being 1) loses accuracy and 2) loses any parallelism because
agg pipelines are single threaded and must process doc streams serially.
Because you claimed accuracy would be better I guess you mean option 2?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/5967eb30-5bd8-42b8-aa35-1793dc77afa7%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.