I have a task,
- Read very large log from agent send to elasticsearch (i use fluent-bit) in real-time
- But, I need modify each record to add my tags by specific condition on each documents (function
add_tags(doc)wrote by Python), So I need scheduler a job do:
- Get all documents not yet tagged from Elasticsearch
- Run throught the method
add_tagsto add field tags to document
- Put again to ElasticSearch
In last step, i need put again to ES, so, if I updated existed documents, is bad perfomance ?
Or should to insert to new index ?