Parallel document processing across nodes?

sdeck51 · April 16, 2018, 2:54pm

Hi,

I've been playing around with making an elastic search plugin and have added a few endpoints to do things that es doesn't currently do like text summary. I was wondering if there was an interface/design pattern for processing multiple documents at the same time in parallel across es nodes. So I'm comfortable in getting a single document and performing things on it's text field/s but how would I go about(Or is it possible to) implementing an algorithm to process documents on multiple nodes in parallel?

Right now I can just call the multiple get request internally, but I assume that's going to be grabbing documents from different nodes and then does the processing on the single node which doesn't sound very efficient(Please correct me if I'm wrong, I'd love to have a better understanding of whats happening internally with the internal api).

Thanks

Igor_Motov · April 16, 2018, 8:32pm

Search in elasticsearch is distributed by its nature. The query phase of search is performed in parallel one thread per shard, up to the number of available threads from the query thread pool. The fetch phase is also done in parallel on each shard. I am not sure how your search plugin operates, so it is difficult for me to suggest anything wrt your plugin design. I can just mention that if your plugin is implemented as a script field it would be part of fetch phase and therefore it will be executed in parallel.

Right now I can just call the multiple get request internally,

I am not sure why you implemented it as a plugin then? Wouldn't it be easier to implement it as an application outside of elasticsearch?

system · May 14, 2018, 8:33pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
How does Elasticsearch process a query? Elasticsearch	7	5555	August 17, 2019
Concurrent searches over same dataset degrades performance Elasticsearch	3	920	July 6, 2017
Multisearch API explaination Elasticsearch	1	369	October 31, 2018
Parallel Queries on Elasticsearch Elasticsearch	3	1428	January 17, 2023
Search multiple indices Elasticsearch	4	381	January 2, 2022

Parallel document processing across nodes?

Related topics