Different bulk helpers client behaviour in Java and JavaScript

spinscale · May 29, 2024, 12:46pm

Hey,

I checked out the node.js client lately and would like to confirm a different behaviour compared to the Java client for example.

I have a use-case where a javascript app runs permanently and receives new data permanently from a data source, like a kafka consumer.

A perfect use-case for the bulk helpers - so I thought. My assumption of how all bulk helpers work, were the following, based on how the Java BulkIngester works in a streaming use-case.

Configure Ingester with ingestion criterias like number of documents, request size, flush interval
Add documents till eternity
Profit!

However reading the JavaScript bulk helper client docs I am not so sure in that case.

First, the bulk helper has no setting for number of documents, only for size and flush. Is that intended, and why does it differ from the Java one? Because you are assuming that the passed data is finite (like an array, a Buffer or a ReadableStream)?

There is no method to add new documents all the time. Would this work with an endless generator like a Channel from the queueable package?

Documentation for this use-case seems non-existing, even though I'd consider it pretty common in streaming use-cases, but maybe my mental model of the javascript client is so wrong, and I am doing things wrong with the client.

Any help greatly appreciated.

Have a good week!

--Alex

Martijn_Laarman · May 29, 2024, 1:54pm

Historically all bulk helpers are pull based, optimizing reading from very large data sources as fast as possible.

A separate usecase is to continuously produce data at variable rates and intervals to Elasticsearch.

Java's BulkIngester is such a push based model. Similar to how beats allows data to pushed over channels forever.

.NET has elastic-ingest-dotnet/src/Elastic.Ingest.Elasticsearch/README.md at main · elastic/elastic-ingest-dotnet · GitHub to provide a push based model to ingest data.

Afaik this is not available for javascript.

Martijn_Laarman · May 29, 2024, 2:10pm

queueable could work to present a channel as stream. Not sure how the internals handle edge cases when no data is yielded over the flush timeout though.

Topic		Replies	Views
Is BulkIngester (replacement of 'Bulk Processor') in elasticsearch java api thread safe? Elasticsearch	3	799	November 1, 2023
Timeout during index refresh Elasticsearch language-clients	2	286	April 16, 2024
BulkIngester index operation does not propagate pipeline if defined Elasticsearch language-clients	3	164	August 16, 2023
Java Client Batch API , missing invocation of afterBulk Elasticsearch	3	744	July 5, 2017
BulkIngester and Integration Testing Elasticsearch language-clients	8	599	October 13, 2023

Different bulk helpers client behaviour in Java and JavaScript

Related topics