Tailing Elasticsearch changelog


We have Elasticsearch cluster storing documents which contain various counters (for example, number of session in a given hour). These documents are created with some initial counts and later updated using scripted upserts, e.g. something like:

Document format:

    "_id": "1"
    "timestamp": "2017-10-12T10:00:00Z",
    "views": 1


ctx._source.views += params.views;

We want to plug something like a listener so that every time when there is an insert or an update of any document, we can ingest a new value of that document into Kafka (to replicate it somewhere else). For example, if there will be 3 updates, we will ingest 3 documents into Kafka:

// Create: doc with id=1
{"_id": "1" "timestamp": "2017-10-12T10:00:00Z", "views": 1}
// Update: inc +1
{"_id": "1" "timestamp": "2017-10-12T10:00:00Z", "views": 2}
// Update: inc +3
{"_id": "1" "timestamp": "2017-10-12T10:00:00Z", "views": 5}

Is there any existing mechanism or plugin which we can use for it? If not, where would you recommend to look at in Elasticsearch source code to put it?


We added sequence numbers in 6.0, this is the first step in enabling this sort of thing.
For now in <6.0 you need to DIY.

The increment is just used to illustrate the situation. In reality, scripts are doing a lot more complex update of the document. We interested in having some sort of listener which will be called or notified every time document is updated. This listener will be ingesting a new version of document to Kafka.

Yep, that still applies :slight_smile:

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.