We use Kafka and Flume to ingest partial updates on a big document. Flume is pulling a batch of partial updates and uses the Bulk API to ingest into elastic, in multi tasking environment.
In a given batch or across the batches, there can be partial updates that will hit the same document to create version conflicts.
If version conflict occurs, each conflict will be resolved by reading the document again and applying the update on separate threads. when number of conflicts are high, conflict resolution is slowing down that eventually resolution goes upto 10 minutes.
How to fix this?
Option 1: Split the document into pieces so reduce the version conflicts. But, query will be complex by looking into 4-5 different nested documents each time
Option 2: Ingest each partial update in time series fashion, that will bring version conflicts to zero. But again query complexity will be high as these documents should be available right after ingestion. (Batch process to aggregate the time series documents will not be possible under a new index name to query as this will take minutes)
Option 3: Send the same partial updates to the same kafka partition and Flume to consume the batch from the same partition assuming all partial updates may come in the same bath. Then it will have to iterate thru the messages and aggregate the partial updates into one and ingest into Elastic as one big partial update, reducing the version conflicts to zero (as in the same batch all partial updates are aggregated into one)
Option 4: Decrease the batch size to a small number so conflicts can be reduced. But, that will reduce the throughput.
Option 5: ???