in logstash pipeline consecutive documents are consumed from RabbitMQ. Sometimes need to perform lookup on already imported documents via filter->elasticsearch->query_template.
However, lookup doesn't query the latest state, as refresh is required! Have tried to explicitly perform refresh via http plugin on each node of the cluster, before lookup
continuously importing into index, where RabbitMQ is configured as input
periodically performing lookup, where elaticsearch index (above) is configured as input. If lookup returned results, than saving the enriched document in new index; otherwise it's examined in next iteration
That does it for now, let me know if you have an advice or better insight please
It sounds relatively similar to the enrich ingest pipeline — maybe that's an alternative? It runs within Elasticsearch and won't need Logstash or a queue; though it has some limitations around updating the lookup index. It might still be a better fit?
Also I'm not sure I follow this part on the _refresh API:
While not recommended for production because of the performance overhead, this should do the right thing.
have tried Enrich policy, but elasticsearch filter fits better for this use-case, because retrieved results can be conditionally handled on custom way. _refresh API doesn't synchronously execute (wait to be refreshed) within same logstash pipeline, so latest could not be sequentially collected in next lines of pipeline code. As you wrote, it's not recommended to attempt to use ElasticSearch in transactional manners.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.