Reason for creating a custom document_id?

I've seen various Logstash filter plugins customizing the document_id field in pipeline filters (e.g. the HELK project). I'm trying to make sense of the reason for doing this. Doesn't a unique document_id get applied automatically?

I'm thinking of the document_id as akin to a unique key in a RDBMS table just as a way to identify an individal record. Perhaps I just haven't discovered what the document_id is useful for beyond allowing the system to refer to the particular document.

New Elastic Stack user here. I apologize for such a beginner question.

If you ingest the same set of documents more than once (because they have been updated) you want to overwrite the target document, so it has to have the same document_id. If you do not set it then a new unique id will be applied, resulting in a second copy of the document.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.