Different bulk helpers client behaviour in Java and JavaScript

Hey,

I checked out the node.js client lately and would like to confirm a different behaviour compared to the Java client for example.

I have a use-case where a javascript app runs permanently and receives new data permanently from a data source, like a kafka consumer.

A perfect use-case for the bulk helpers - so I thought. My assumption of how all bulk helpers work, were the following, based on how the Java BulkIngester works in a streaming use-case.

  1. Configure Ingester with ingestion criterias like number of documents, request size, flush interval
  2. Add documents till eternity
  3. Profit!

However reading the JavaScript bulk helper client docs I am not so sure in that case.

First, the bulk helper has no setting for number of documents, only for size and flush. Is that intended, and why does it differ from the Java one? Because you are assuming that the passed data is finite (like an array, a Buffer or a ReadableStream)?

There is no method to add new documents all the time. Would this work with an endless generator like a Channel from the queueable package?

Documentation for this use-case seems non-existing, even though I'd consider it pretty common in streaming use-cases, but maybe my mental model of the javascript client is so wrong, and I am doing things wrong with the client.

Any help greatly appreciated.

Have a good week!

--Alex

Historically all bulk helpers are pull based, optimizing reading from very large data sources as fast as possible.

A separate usecase is to continuously produce data at variable rates and intervals to Elasticsearch.

Java's BulkIngester is such a push based model. Similar to how beats allows data to pushed over channels forever.

.NET has elastic-ingest-dotnet/src/Elastic.Ingest.Elasticsearch/README.md at main · elastic/elastic-ingest-dotnet · GitHub to provide a push based model to ingest data.

Afaik this is not available for javascript.

1 Like

queueable could work to present a channel as stream. Not sure how the internals handle edge cases when no data is yielded over the flush timeout though.