Elastic search trained model inference not working

After importing the text embedding sentence-transformers/msmarco-MiniLM-L12-cos-v5 model to elasticsearch using eland the model infering doesnt work.

eland command:

eland_import_hub_model --url http://localhost:9200 --hub-model-id sentence-transformers/msmarco-MiniLM-L-12-v3 --task-type text_embedding --start

Infer request:

POST /_ml/trained_models/sentence-transformers__msmarco-minilm-l12-cos-v5/deployment/_infer
{
  "docs": {
    "text_field": "How is the weather in Jamaica?"
  }
}

Error:

{
    "error": {
        "root_cause": [
            {
                "type": "status_exception",
                "reason": "Error in inference process: [inference canceled as process is stopping]"
            }
        ],
        "type": "status_exception",
        "reason": "Error in inference process: [inference canceled as process is stopping]"
    },
    "status": 500
}

Docker logs of all the processes

2024-05-16 14:47:29 {"@timestamp":"2024-05-16T12:47:29.207Z", "log.level": "INFO", "message":"[sentence-transformers__msmarco-minilm-l12-cos-v5] Starting model deployment of model [sentence-transformers__msmarco-minilm-l12-cos-v5]", "ecs.version": "1.2.0","service.name":"ES_ECS","event.dataset":"elasticsearch.server","process.thread.name":"elasticsearch[f059c181c171][ml_utility][T#3]","log.logger":"org.elasticsearch.xpack.ml.inference.deployment.DeploymentManager","elasticsearch.cluster.uuid":"eEqzvIFnQx6Lt08J4545wQ","elasticsearch.node.id":"-It6EisaTT2HVi05RqZGdA","elasticsearch.node.name":"f059c181c171","elasticsearch.cluster.name":"elasticsearch"}
2024-05-16 14:47:55 {"@timestamp":"2024-05-16T12:47:55.102Z", "log.level": "WARN",  "data_stream.dataset":"deprecation.elasticsearch","data_stream.namespace":"default","data_stream.type":"logs","elasticsearch.event.category":"api","event.code":"deprecated_route_POST_/_ml/trained_models/{model_id}/deployment/_infer","message":"[POST /_ml/trained_models/{model_id}/deployment/_infer] is deprecated! Use [POST /_ml/trained_models/{model_id}/_infer] instead." , "ecs.version": "1.2.0","service.name":"ES_ECS","event.dataset":"deprecation.elasticsearch","process.thread.name":"elasticsearch[f059c181c171][transport_worker][T#4]","log.logger":"org.elasticsearch.deprecation.rest.RestController","elasticsearch.cluster.uuid":"eEqzvIFnQx6Lt08J4545wQ","elasticsearch.node.id":"-It6EisaTT2HVi05RqZGdA","elasticsearch.node.name":"f059c181c171","elasticsearch.cluster.name":"elasticsearch"}
2024-05-16 14:47:55 {"@timestamp":"2024-05-16T12:47:55.413Z", "log.level":"ERROR", "message":"[sentence-transformers__msmarco-minilm-l12-cos-v5] pytorch_inference/336 process stopped unexpectedly: Fatal error: 'si_signo 11, si_code: 1, si_errno: 0, address: 0xffffb27ca140, library: /lib/aarch64-linux-gnu/libc.so.6, base: 0xffffb26bd000, normalized address: 0x10d140', version: 8.13.4 (build 8480947324d752)\n", "ecs.version": "1.2.0","service.name":"ES_ECS","event.dataset":"elasticsearch.server","process.thread.name":"elasticsearch[f059c181c171][ml_native_inference_comms][T#4]","log.logger":"org.elasticsearch.xpack.ml.process.AbstractNativeProcess","elasticsearch.cluster.uuid":"eEqzvIFnQx6Lt08J4545wQ","elasticsearch.node.id":"-It6EisaTT2HVi05RqZGdA","elasticsearch.node.name":"f059c181c171","elasticsearch.cluster.name":"elasticsearch"}
2024-05-16 14:47:55 {"@timestamp":"2024-05-16T12:47:55.414Z", "log.level":"ERROR", "message":"[sentence-transformers__msmarco-minilm-l12-cos-v5] inference process crashed due to reason [[sentence-transformers__msmarco-minilm-l12-cos-v5] pytorch_inference/336 process stopped unexpectedly: Fatal error: 'si_signo 11, si_code: 1, si_errno: 0, address: 0xffffb27ca140, library: /lib/aarch64-linux-gnu/libc.so.6, base: 0xffffb26bd000, normalized address: 0x10d140', version: 8.13.4 (build 8480947324d752)\n]", "ecs.version": "1.2.0","service.name":"ES_ECS","event.dataset":"elasticsearch.server","process.thread.name":"elasticsearch[f059c181c171][ml_native_inference_comms][T#4]","log.logger":"org.elasticsearch.xpack.ml.inference.deployment.DeploymentManager","elasticsearch.cluster.uuid":"eEqzvIFnQx6Lt08J4545wQ","elasticsearch.node.id":"-It6EisaTT2HVi05RqZGdA","elasticsearch.node.name":"f059c181c171","elasticsearch.cluster.name":"elasticsearch"}
2024-05-16 14:47:55 {"@timestamp":"2024-05-16T12:47:55.414Z", "log.level": "INFO", "message":"Inference process [sentence-transformers__msmarco-minilm-l12-cos-v5] failed due to [[sentence-transformers__msmarco-minilm-l12-cos-v5] pytorch_inference/336 process stopped unexpectedly: Fatal error: 'si_signo 11, si_code: 1, si_errno: 0, address: 0xffffb27ca140, library: /lib/aarch64-linux-gnu/libc.so.6, base: 0xffffb26bd000, normalized address: 0x10d140', version: 8.13.4 (build 8480947324d752)\n]. This is the [1] failure in 24 hours, and the process will be restarted.", "ecs.version": "1.2.0","service.name":"ES_ECS","event.dataset":"elasticsearch.server","process.thread.name":"elasticsearch[f059c181c171][ml_native_inference_comms][T#4]","log.logger":"org.elasticsearch.xpack.ml.inference.deployment.DeploymentManager","elasticsearch.cluster.uuid":"eEqzvIFnQx6Lt08J4545wQ","elasticsearch.node.id":"-It6EisaTT2HVi05RqZGdA","elasticsearch.node.name":"f059c181c171","elasticsearch.cluster.name":"elasticsearch"}
2024-05-16 14:47:55 {"@timestamp":"2024-05-16T12:47:55.414Z", "log.level": "INFO", "message":"[sentence-transformers__msmarco-minilm-l12-cos-v5] Starting model deployment of model [sentence-transformers__msmarco-minilm-l12-cos-v5]", "ecs.version": "1.2.0","service.name":"ES_ECS","event.dataset":"elasticsearch.server","process.thread.name":"elasticsearch[f059c181c171][ml_native_inference_comms][T#4]","log.logger":"org.elasticsearch.xpack.ml.inference.deployment.DeploymentManager","elasticsearch.cluster.uuid":"eEqzvIFnQx6Lt08J4545wQ","elasticsearch.node.id":"-It6EisaTT2HVi05RqZGdA","elasticsearch.node.name":"f059c181c171","elasticsearch.cluster.name":"elasticsearch"}
2024-05-16 14:47:55 {"@timestamp":"2024-05-16T12:47:55.414Z", "log.level": "WARN", "message":"path: /_ml/trained_models/sentence-transformers__msmarco-minilm-l12-cos-v5/deployment/_infer, params: {model_id=sentence-transformers__msmarco-minilm-l12-cos-v5}, status: 500", "ecs.version": "1.2.0","service.name":"ES_ECS","event.dataset":"elasticsearch.server","process.thread.name":"elasticsearch[f059c181c171][ml_native_inference_comms][T#5]","log.logger":"rest.suppressed","elasticsearch.cluster.uuid":"eEqzvIFnQx6Lt08J4545wQ","elasticsearch.node.id":"-It6EisaTT2HVi05RqZGdA","elasticsearch.node.name":"f059c181c171","elasticsearch.cluster.name":"elasticsearch","error.type":"org.elasticsearch.ElasticsearchStatusException","error.message":"Error in inference process: [inference canceled as process is stopping]","error.stack_trace":"org.elasticsearch.ElasticsearchStatusException: Error in inference process: [inference canceled as process is stopping]\n\tat org.elasticsearch.ml@8.13.4/org.elasticsearch.xpack.ml.inference.deployment.AbstractPyTorchAction.onFailure(AbstractPyTorchAction.java:114)\n\tat org.elasticsearch.ml@8.13.4/org.elasticsearch.xpack.ml.inference.deployment.InferencePyTorchAction.processResult(InferencePyTorchAction.java:182)\n\tat org.elasticsearch.ml@8.13.4/org.elasticsearch.xpack.ml.inference.deployment.InferencePyTorchAction.lambda$doRun$3(InferencePyTorchAction.java:151)\n\tat org.elasticsearch.server@8.13.4/org.elasticsearch.action.ActionListener$2.onResponse(ActionListener.java:171)\n\tat org.elasticsearch.ml@8.13.4/org.elasticsearch.xpack.ml.inference.pytorch.process.PyTorchResultProcessor.lambda$notifyAndClearPendingResults$3(PyTorchResultProcessor.java:145)\n\tat java.base/java.util.concurrent.ConcurrentHashMap.forEach(ConcurrentHashMap.java:1603)\n\tat org.elasticsearch.ml@8.13.4/org.elasticsearch.xpack.ml.inference.pytorch.process.PyTorchResultProcessor.notifyAndClearPendingResults(PyTorchResultProcessor.java:144)\n\tat org.elasticsearch.ml@8.13.4/org.elasticsearch.xpack.ml.inference.pytorch.process.PyTorchResultProcessor.process(PyTorchResultProcessor.java:137)\n\tat org.elasticsearch.ml@8.13.4/org.elasticsearch.xpack.ml.inference.deployment.DeploymentManager.lambda$startDeployment$2(DeploymentManager.java:180)\n\tat org.elasticsearch.server@8.13.4/org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:917)\n\tat java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)\n\tat java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)\n\tat java.base/java.lang.Thread.run(Thread.java:1583)\n"}
2024-05-16 14:48:00 {"@timestamp":"2024-05-16T12:48:00.161Z", "log.level": "INFO", "message":"[.ds-.logs-deprecation.elasticsearch-default-2024.05.16-000001] creating index, cause [initialize_data_stream], templates [.deprecation-indexing-template], shards [1]/[1]", "ecs.version": "1.2.0","service.name":"ES_ECS","event.dataset":"elasticsearch.server","process.thread.name":"elasticsearch[f059c181c171][masterService#updateTask][T#8]","log.logger":"org.elasticsearch.cluster.metadata.MetadataCreateIndexService","elasticsearch.cluster.uuid":"eEqzvIFnQx6Lt08J4545wQ","elasticsearch.node.id":"-It6EisaTT2HVi05RqZGdA","elasticsearch.node.name":"f059c181c171","elasticsearch.cluster.name":"elasticsearch"}
2024-05-16 14:48:00 {"@timestamp":"2024-05-16T12:48:00.167Z", "log.level": "INFO", "message":"adding data stream [.logs-deprecation.elasticsearch-default] with write index [.ds-.logs-deprecation.elasticsearch-default-2024.05.16-000001], backing indices [], and aliases []", "ecs.version": "1.2.0","service.name":"ES_ECS","event.dataset":"elasticsearch.server","process.thread.name":"elasticsearch[f059c181c171][masterService#updateTask][T#8]","log.logger":"org.elasticsearch.cluster.metadata.MetadataCreateDataStreamService","elasticsearch.cluster.uuid":"eEqzvIFnQx6Lt08J4545wQ","elasticsearch.node.id":"-It6EisaTT2HVi05RqZGdA","elasticsearch.node.name":"f059c181c171","elasticsearch.cluster.name":"elasticsearch"}
1 Like

I have the same issue

@Tan_H and @likealam which version of Elastic are you using? Not sure if it helps but my colleague did share some tips on diagnosing ML issues in this blog that might be worth checking.

Hi @carly.richmond , I running the latest version of the stack, from GitHub - deviantony/docker-elk: The Elastic stack (ELK) powered by Docker and Compose.

Thanks for confirming @Tan_H. I see you are using images published by a developer in the community. It's difficult to say if it's an issue with the container configuration.

Did you make any changes to your local configuration as well? If so it would be useful to share your container configuration and we can take a look.

If the troubleshooting blog isn't useful I would also recommend reaching out to the developer too.

Hope that helps!