If there is a large gap in the times that batches of documents arrive, you may want to increase xpack.ml.inference_model.time_to_live to match that so that the model is never evicted from cache.
I allocated 2GB for JVM Heap on every node.
I were also not sure whether 2GB was enough or not for the lang_ident_model_1 model. I didn't find any recommendation about it on the document.
The only inference processor was the lang_ident_model_1 processor.
And from the time_in_millis metric of GET _nodes/stats/ingest, the processor was indeed the bottleneck. It took 95% of time of the pipeline.
@rueian , I am wondering if the inference time is including the time it took to load the model initially and put it into cache.
Once you have one pipeline running and have sent some docs through, I wonder if you could open another pipeline (this one will use the model that is in cache from the previous pipeline) and whats the throughput there?
Again, there will always be some slow down, just wanting to make sure how much there is
I just tested with a bigger cluster. Each node had 8 core and 14GB JVM Heap and they finally achieved indexing 3000 documents per second.
I also noticed that the JVM Heap usage keep changing dramatically, however the only job the cluster did was indexing those documents with the pipeline. Was the usage expected? or it was the signal that the model being keep moved in and out the cache?
@rueian it could be dropping in and out of cache, but it should only do that if its TTL was reached or it was kicked out of cache by another model (since this is your only model, I don't think the latter is occurring).
The JVM heap utilization doesn't seem to have a cadence of 5m. It very well could be the inference action itself, encoding the text then pushing it through the model and then decoding. That creates some short lived double[] values that could increase JVM, and then young GC cleans them out quickly without blocking.
The way to verify is to look at the logs of the cluster and see if there are logs indicating as much for ModelLoadingService (FQDN org.elasticsearch.xpack.ml.inference.loadingservice.ModelLoadingService)
To see which docs are hitting cache or not, you could turn on TRACE for that particular class, but that will write multiple log messages PER document, so it would not only create a ton of log messages, but could impact performance
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.