Does a bulk indexing request compact / unique?

rex-remind · January 14, 2021, 6:51pm

I'm curious if a bulk indexing API request compacts / uniques when ingested by Elasticsearch, i.e. will a create, then delete, then create, then update of a single document id collapse into just 1 create?

If not, why not?

DavidTurner · January 14, 2021, 7:04pm

No, it doesn't. The actions on an individual document execute independently. With your example:

create
delete
create
update

If the document already existed then the first step, a create, would fail but the other actions would work normally. In principle they could be collapsed, particularly in this example since it includes a delete, but more generally that doesn't seem to be true often enough for this to be an optimisation that's worth implementing.

rex-remind · January 14, 2021, 7:19pm

When working with Flink (or possibly any other stateful streaming system) which produces continuous retracts and inserts from many joins, I've found that it may update the same document many many times in a single bulk request.

DavidTurner · January 14, 2021, 7:51pm

Are there any cases where it's not possible to collapse multiple updates on the client? I think you're asking about doing this within Elasticsearch itself, but I'd expect the client to have more information about what the documents mean and therefore be able to do a better job of collapsing the sequence of operations before putting it into a bulk request at all.

rex-remind · January 14, 2021, 8:20pm

Flink's Elasticsearch connector is fairly opaque and non-configurable but at the end of the pipeline it just wraps the ES java client with mostly default settings. It might be possible to fork it but then there's a concern about divergence from source-of-truth, and it's also not the smallest codebase to work with. Doing this from Elasticsearch itself seems like it would be easier in this case.

DavidTurner · January 14, 2021, 8:45pm

I think the same arguments apply to not doing this server-side in Elasticsearch too - you can't configure anything like this today, and it's certainly not a small or simple codebase either. IMO it's better to push this kind of highly-parallelisable work out to the edges as much as possible.

system · February 11, 2021, 8:45pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Multiple bulk actions on the same document Elasticsearch	7	464	May 10, 2023
Bulk update with java API Elasticsearch	3	2457	July 6, 2017
Bulk API description for create operation behavior: duplicates based on _id not index, right? Elasticsearch	2	585	October 26, 2017
Downside to using Bulk API for small/single-doc sets? Elasticsearch	5	454	July 6, 2017
Elasticsearch Java API Client BulkRequest Elasticsearch	4	249	April 1, 2024

Does a bulk indexing request compact / unique?

Related topics