Extra processing during every indexing action

I would like to have a custom behavior like performing some action/processing on(before/after) every index request to ES and update a field of inserted document after processing. I've done some research on Watcher component in ES and custom plugin development for the same. But I couldn't find suited options or methods for this. Any help would be appreciated.

What you're asking is a bit too generic to answer outright. Can you give some more insight into what specifically you're trying to do?

There are a number of potential options that may fit, depending on what that custom action or processing is. One option may be ingest node (https://www.elastic.co/guide/en/elasticsearch/reference/5.3/ingest.html) which has a number of processors already built in (https://www.elastic.co/guide/en/elasticsearch/reference/5.3/ingest-processors.html). If it's not one of the existing processors, it's possible we may be adding one in the future that does what you're looking for, and it's nice to be able to hear what people are trying to do. If not, it may be possible to build your own ingest processor (see https://www.elastic.co/blog/writing-your-own-ingest-processor-for-elasticsearch for information on that). It also may be possible to use a component before indexing into elasticsearch, e.g. Logstash, to do the processing.

1 Like

@shanec : We would like to have a field like isProcessed initially set to false. After the indexing we would like to save original file that is indexed to a remote storage and do some other processing and update that field afterwards as true. We would like to do this for all the indexing to the cluster. Would like to know ingest node is the option here. Or any suitable options ?

@shanec : there any way we can set an ingest processor /pipeline config for all indices in it ? Or at least set it as a setting during the index creation to use it on every index request ?

@dadoonet : Any help or suggestions on this ?

If you're trying to actually do something like move a file around, I'd recommend you do this outside of Elasticsearch and in an upstream process. Otherwise you have to give every Elasticsearch node access to wherever the files come from / go to, including dealing with java security manager issues, etc. Elasticsearch isn't really meant for that. So I'd do that business logic in an upstream process and then use the _update API to update the isProcessed field after you've finished doing whatever you need to do.

Ok. thank you for your suggestion. :slight_smile:

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.