Java Update Request (Upsert) pipeline not (supported) / working

LGans316 · July 25, 2018, 5:16pm

ES version: 6.3.1

There is no way to setPipeline on UpdateRequest API. So tried the below and sadly it doesn't work.
The existing document is being updated but without the pipeline transformations applied.

Please can the experts suggest some solution or alternatives.

 IndexRequest request = new IndexRequest(indexConfig.getIndexName(), indexConfig.getIndexType(), docId)
                .source(source);
        request.setPipeline(indexConfig.getPipeline());

        if (appConfig.isUpdateRequest()) {
            UpdateRequest upsertRequest = new UpdateRequest(indexConfig.getIndexName(), indexConfig.getIndexType(),
                    docId).doc(source).upsert(request);
            bulkProcessor.add(upsertRequest);
        } else {
            bulkProcessor.add(request);
        }

dadoonet · July 25, 2018, 10:05pm

It's not available. See the whole discussion at:

LGans316 · July 26, 2018, 7:19am

Very disappointed to hear that to be honest

We perform bulk indexing all the time. First we do an initial bulk indexing with pipelines. After that we so delta indexing again on bulk mode and here we need the same pipelines applied so thar the end data on the documents are the same as the one during initial indexing.

No support for pipelines on bulk update means I have to either call update by query with pipeline post index update or remove pipeline altogether and stick the pipeline logic in the code which is bad.

dadoonet · July 26, 2018, 7:41am

What kind of processor are you using in the pipeline?

LGans316 · July 26, 2018, 8:20am

Mainly lowercase and replace processors. For an upcoming project, we have to probably use a complex script pipeline processor.

dadoonet · July 26, 2018, 8:34am

Why using update API then? Why not reindexing the whole document?

Update API should use IMHO only in 2 cases:

Huge documents, like megabytes of Json
Usage of the attachment processor with an array or with big binary documents

Otherwise, I'd recommend using the index API.

LGans316 · July 26, 2018, 9:45am

Thanks for the reply David.

Here is what we do:

Initial indexing (bulk / insert / IndexRequest) - We pull all entities to be indexed from an application REST endpoint.
Delta indexing (bulk / update / UpdateRequest with docAsUpsert) - Here we pull all entities created or modified as of a given point in time. In response, we may get nothing or may get even a million entities. In this scenario, we will have to update documents if they already exists or create new ones if they don't.

So we have to apply the same pipelines in both routes. This way the field values are in-tact.

What's happening now is:

Initial indexing - field named 'status' is converted to lowercase via lowercase processor. The values are active & inactive.
Delta indexing - as pipeline is not applied, the value in the status field changes to ACTIVE / INACTIVE / Active / Inactive etc.

Yes, we can workaround this but I strongly feel you must consider supporting pipeline on Bulk UpdateRequest.

dadoonet · July 26, 2018, 10:38am

I strongly feel you must consider supporting pipeline on Bulk UpdateRequest.

Yeah. We did. But as written in this comment:

Discussed during fix it friday and this looks like a useful enhancement, but there are corner cases which would make it very tricky to support this. (index name or routing is changed during ingestion or when a node isn't allowed to run ingest) Therefor I'm closing this issue and we can re-evaluate this at a later time if this is still useful and the technical concerns can fixed easily.

A workaround could be may be using the update by query instead as this is supporting ingest. With the price of slowness...

Another workaround would be to simulate that by yourself by calling the _simulate ingest endpoint and then send the result using the update API.

Another one is described here: https://github.com/elastic/elasticsearch/issues/17895#issuecomment-357661426

LGans316 · July 26, 2018, 10:05pm

IndexRequest (bulk) is able to do insert / update. Caught me by surprise.

dadoonet · July 26, 2018, 10:18pm

That's what I tried to say at Java Update Request (Upsert) pipeline not (supported) / working

LGans316 · July 30, 2018, 4:43pm

Thanks David. Sorry. I didn't read your post properly earlier as I was in full steam on that day. Thanks again for your quick reply. Much appreciated.

system · August 27, 2018, 4:43pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Use pipeline (ingest) in Java Index/Update API Elasticsearch	8	2299	December 29, 2016
Using Bulk API with Update in elasticsearch 0.19.3 Elasticsearch	12	468	July 6, 2017
Hot to represent pipeline in Java API for ingest-attachment? Elasticsearch	3	1482	December 27, 2016
Index (ingest attachment) update in elastic search or storing multipe document in same index Elasticsearch	14	2458	September 4, 2017
Ingest pipeline in Update Api Elasticsearch	1	696	May 1, 2020

Java Update Request (Upsert) pipeline not (supported) / working

Related topics