Hi there,
I'll start by confessing that we're still using an ancient version of ES (1.7.2) - please don't laugh
We have plans to upgrade real soon.
We receive about 30k events every minute with the following data:
- timestamp
- some id
- status for this id
- some additional fields
The events may include new ids, but typically they refer to existing ones.
We need to index the data in ES in a way that we'll be able to calculate the duration of the status for each id. That means that we need to keep the timestamp of the new event only if the id is new, or if the status for this id has changed.
We tackled this by using a simple Groovy script for updating the documents. However, when we set the replication factor to 1, we saw that the indexing time got much higher, in a way that there were rejects for our bulk update (that also includes other events). However, we cannot work with replication, as in such a case we'll lose data (and, worse than that, will influence the indexing time of other events in the same bulk).
We prefer not to change the queue_size and the bulk_size, and we cannot have two separate bulks (for this event and for other event type).
I was wondering if you have any better suggestion for modeling this problem, or any ideas how to tackle the replication performance issue.
Is there any was to postpone the replication (we're fine with that)? Or to make it more efficient?
Thanks a lot!