Could a custom Aggregator be used for general purpose Map/Reduce or bulk update?


(Daniel Winterstein) #1

Hello,

So by writing a plugin you can create a custom aggregation.[1]

I'd like to explore what we could do with that.
Why? I'm looking for ways round a costly scan-and-update-each-document
algorithm.

Do Aggregators run in a parallel fashion, with your aggregation being run
against all shards at once?
Or do they go through the shards sequentially?

How does an Aggregator run only-once for each matching document?
I.e. if we have a shard replicated on 3 nodes... Does the aggregation pick
one node for that shard?
Or does it build up in memory a set of seen documents to avoid duplicating?

What happens if you make calls to ElasticSearch from within an Aggregator?
Such as updating a document.

What about updating the context document that the Aggregator is looking at
then-and-there -- could that be done efficiently from within the Aggregator?
If so, would you & could you override something in the Aggregator class so
it runs over every copy of every document?

Thank you for any help with these questions!

Best regards,

  • Daniel

1:
https://groups.google.com/forum/#!searchin/elasticsearch/aggregations/elasticsearch/0UYLbyeWiw4/RlSnJtgDj0AJ

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/a635a37d-7acd-4ca3-af00-dea882ed27ae%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(Jörg Prante) #2

I try to answer some of the queries though I must admit, I am not too much
familiar with the aggregation source code yet (still exploring).

Aggregations work like a search, they are "embedded" into the search
actions, and work over the result set of a search. They run in each shard,
just like the search actions do.

Duplication is avoided in the sense that it is known to the cluster where
the shards/replicas are located, and the participating nodes are picked
before the action execution starts.

Building up a memory structure is also correct, but that is per hit, as the
aggregation organizes results into "buckets".

Updating documents seems possible but I have no good imagination right now
about how modified content could be refeeded into an internal ES bulk
service. From my understanding it would be easier to create an aggregation
result first, return it to the client, and use this result as content in a
new subsequent action, i.e. bulk indexing.

I think you do not need to override existing code, the code is really a
masterpiece - well prepared for extending the framework with exciting
features...

Jörg

On Fri, Jun 6, 2014 at 12:03 AM, Daniel Winterstein <
daniel.winterstein@gmail.com> wrote:

Hello,

So by writing a plugin you can create a custom aggregation.[1]

I'd like to explore what we could do with that.
Why? I'm looking for ways round a costly scan-and-update-each-document
algorithm.

Do Aggregators run in a parallel fashion, with your aggregation being run
against all shards at once?
Or do they go through the shards sequentially?

How does an Aggregator run only-once for each matching document?
I.e. if we have a shard replicated on 3 nodes... Does the aggregation pick
one node for that shard?
Or does it build up in memory a set of seen documents to avoid duplicating?

What happens if you make calls to ElasticSearch from within an Aggregator?
Such as updating a document.

What about updating the context document that the Aggregator is looking at
then-and-there -- could that be done efficiently from within the Aggregator?
If so, would you & could you override something in the Aggregator class so
it runs over every copy of every document?

Thank you for any help with these questions!

Best regards,

  • Daniel

1:
https://groups.google.com/forum/#!searchin/elasticsearch/aggregations/elasticsearch/0UYLbyeWiw4/RlSnJtgDj0AJ

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/a635a37d-7acd-4ca3-af00-dea882ed27ae%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/a635a37d-7acd-4ca3-af00-dea882ed27ae%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoECdbq9NSNSxWaTykbA93kP858DLf94XdTt8JFkO2k%2BGA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(system) #3