ELSER v2 model inference crashing

Hi Team,

Before filing an issue, I was wondering if I could get some assistance in trying to test inference on the ELSER v2 model in case I am missing something in the docs

Elasticsearch version: 8.17.2
ML Nodes: 1
Instance Type: Amazon r7gd.xlarge instance (ARM64 arch, 4 vCPUs, 32 GB RAM)
OS: Ubuntu 22.04.4 LTS

First Question - Do ELSER/E5 models support ARM64-based architectures. I didn't find any documentation explicitly stating it either way but a similar topic had issues deploying these models on similar EC2 ARM64 instances

Second, here are the steps I ran to setup the inference and ELSER v2 model, and the error I am getting -

  • After spinning-up and provisioning my node with the appropriate roles and ml settings, I started the free trial (_license/start_trial?acknowledge=true)

  • Querying _ml/trained_models?pretty, the only installed model I have is lang_ident_model_1 for identifying language

  • I created an inference endpoint using the API - _inference/sparse_embedding/my-elser-model

  • The inference endpoint was created successfully and I see the model imported/started in the logs, though I see a few warnings triggered by OjAlgoUtils around ojAlgo includes a small set of predefined hardware profiles none of which were deemed suitable for the hardware you're currently using.

{"type": "server", "timestamp": "2025-03-03T13:30:01,349Z", "level": "INFO", "component": "o.e.c.r.a.AllocationService", "cluster.name": "dev-purple", "node.name": "purple-master-0", "message": "current.health=\"GREEN\" message=\"Cluster health status changed from [YELLOW] to [GREEN] (reason: [shards started [[.ml-inference-native-000002][0]]]).\" previous.health=\"YELLOW\" reason=\"shards started [[.ml-inference-native-000002][0]]\"", "cluster.uuid": "tFP7DsTIR-OYrg24tRT_WA", "node.id": "dyWTELgFRsueu3XzQfrmtA" , "current.health":"GREEN", "message":"Cluster health status changed from [YELLOW] to [GREEN] (reason: [shards started [[.ml-inference-native-000002][0]]]).", "previous.health":"YELLOW", "reason":"shards started [[.ml-inference-native-000002][0]]"  }
{"type": "server", "timestamp": "2025-03-03T13:30:05,637Z", "level": "INFO", "component": "o.e.x.m.p.a.TransportLoadTrainedModelPackage", "cluster.name": "dev-purple", "node.name": "purple-master-0", "message": "[.elser_model_2] finished model import after [5] seconds", "cluster.uuid": "tFP7DsTIR-OYrg24tRT_WA", "node.id": "dyWTELgFRsueu3XzQfrmtA"  }
{"type": "server", "timestamp": "2025-03-03T13:30:05,948Z", "level": "INFO", "component": "stdout", "cluster.name": "dev-purple", "node.name": "purple-master-0", "message": "ojAlgo includes a small set of predefined hardware profiles,", "cluster.uuid": "tFP7DsTIR-OYrg24tRT_WA", "node.id": "dyWTELgFRsueu3XzQfrmtA"  }
{"type": "server", "timestamp": "2025-03-03T13:30:05,949Z", "level": "INFO", "component": "stdout", "cluster.name": "dev-purple", "node.name": "purple-master-0", "message": "none of which were deemed suitable for the hardware you're currently using.", "cluster.uuid": "tFP7DsTIR-OYrg24tRT_WA", "node.id": "dyWTELgFRsueu3XzQfrmtA"  }
{"type": "server", "timestamp": "2025-03-03T13:30:05,949Z", "level": "INFO", "component": "stdout", "cluster.name": "dev-purple", "node.name": "purple-master-0", "message": "A default hardware profile, that is perfectly usable, has been set for you.", "cluster.uuid": "tFP7DsTIR-OYrg24tRT_WA", "node.id": "dyWTELgFRsueu3XzQfrmtA"  }
{"type": "server", "timestamp": "2025-03-03T13:30:05,949Z", "level": "INFO", "component": "stdout", "cluster.name": "dev-purple", "node.name": "purple-master-0", "message": "You may want to set org.ojalgo.OjAlgoUtils.ENVIRONMENT to something that", "cluster.uuid": "tFP7DsTIR-OYrg24tRT_WA", "node.id": "dyWTELgFRsueu3XzQfrmtA"  }
{"type": "server", "timestamp": "2025-03-03T13:30:05,949Z", "level": "INFO", "component": "stdout", "cluster.name": "dev-purple", "node.name": "purple-master-0", "message": "better matches the hardware/OS/JVM you're running on, than the default.", "cluster.uuid": "tFP7DsTIR-OYrg24tRT_WA", "node.id": "dyWTELgFRsueu3XzQfrmtA"  }
{"type": "server", "timestamp": "2025-03-03T13:30:05,950Z", "level": "INFO", "component": "stdout", "cluster.name": "dev-purple", "node.name": "purple-master-0", "message": "Additionally it would be appreciated if you contribute your hardware profile:", "cluster.uuid": "tFP7DsTIR-OYrg24tRT_WA", "node.id": "dyWTELgFRsueu3XzQfrmtA"  }
{"type": "server", "timestamp": "2025-03-03T13:30:05,950Z", "level": "INFO", "component": "stdout", "cluster.name": "dev-purple", "node.name": "purple-master-0", "message": "https://github.com/optimatika/ojAlgo/issues", "cluster.uuid": "tFP7DsTIR-OYrg24tRT_WA", "node.id": "dyWTELgFRsueu3XzQfrmtA"  }
{"type": "server", "timestamp": "2025-03-03T13:30:05,950Z", "level": "INFO", "component": "stdout", "cluster.name": "dev-purple", "node.name": "purple-master-0", "message": "Architecture=aarch64 Threads=4 Memory=16517169152", "cluster.uuid": "tFP7DsTIR-OYrg24tRT_WA", "node.id": "dyWTELgFRsueu3XzQfrmtA"  }
{"type": "server", "timestamp": "2025-03-03T13:30:06,112Z", "level": "INFO", "component": "o.e.x.m.i.d.DeploymentManager", "cluster.name": "dev-purple", "node.name": "purple-master-0", "message": "[my-elser-model] Starting model deployment of model [.elser_model_2]", "cluster.uuid": "tFP7DsTIR-OYrg24tRT_WA", "node.id": "dyWTELgFRsueu3XzQfrmtA"  }
  • Re-querying _ml/trained_models?pretty, I can see .elser_model_2 listed now
{
      "model_id" : ".elser_model_2",
      "model_type" : "pytorch",
      "model_package" : {
        "packaged_model_id" : "elser_model_2",
        "model_repository" : "https://ml-models.elastic.co",
        "minimum_version" : "11.0.0",
        "size" : 438123914,
        "sha256" : "2e0450a1c598221a919917cbb05d8672aed6c613c028008fedcd696462c81af0",
        "metadata" : { },
        "tags" : [ ],
        "vocabulary_file" : "elser_model_2.vocab.json"
      },
      "created_by" : "api_user",
      "version" : "12.0.0",
      "create_time" : 1741008600166,
      "model_size_bytes" : 0,
      "estimated_operations" : 0,
      "license_level" : "platinum",
      "description" : "Elastic Learned Sparse EncodeR v2",
      "tags" : [
        "elastic"
      ],
      "metadata" : { },
      "input" : {
        "field_names" : [
          "text_field"
        ]
      },
      "inference_config" : {
        "text_expansion" : {
          "vocabulary" : {
            "index" : ".ml-inference-native-000002"
          },
          "tokenization" : {
            "bert" : {
              "do_lower_case" : true,
              "with_special_tokens" : true,
              "max_sequence_length" : 512,
              "truncate" : "first",
              "span" : -1
            }
          }
        }
      },
      "location" : {
        "index" : {
          "name" : ".ml-inference-native-000002"
        }
      }
    }
  • Running _infer against the model results in the following error as well as the elasticsearch service crashing/restart
curl -X POST "<myhost>/_ml/trained_models/.elser_model_2/_infer" -H 'Content-Type: application/json' -d'
{
  "docs": [{ "text_field": "This is a test sentence" }]
}
'

{"type": "server", "timestamp": "2025-03-03T13:32:47,656Z", "level": "ERROR", "component": "o.e.x.m.p.AbstractNativeProcess", "cluster.name": "dev-purple", "node.name": "purple-master-0", "message": "[my-elser-model] pytorch_inference/18123 process stopped unexpectedly: ", "cluster.uuid": "tFP7DsTIR-OYrg24tRT_WA", "node.id": "dyWTELgFRsueu3XzQfrmtA"  }
{"type": "server", "timestamp": "2025-03-03T13:32:47,730Z", "level": "ERROR", "component": "o.e.x.m.i.d.DeploymentManager", "cluster.name": "dev-purple", "node.name": "purple-master-0", "message": "[my-elser-model] inference process crashed due to reason [[my-elser-model] pytorch_inference/18123 process stopped unexpectedly: ]", "cluster.uuid": "tFP7DsTIR-OYrg24tRT_WA", "node.id": "dyWTELgFRsueu3XzQfrmtA"  }
{"type": "server", "timestamp": "2025-03-03T13:32:47,731Z", "level": "INFO", "component": "o.e.x.m.i.d.DeploymentManager", "cluster.name": "dev-purple", "node.name": "purple-master-0", "message": "Inference process [my-elser-model] failed due to [[my-elser-model] pytorch_inference/18123 process stopped unexpectedly: ]. This is the [1] failure in 24 hours, and the process will be restarted.", "cluster.uuid": "tFP7DsTIR-OYrg24tRT_WA", "node.id": "dyWTELgFRsueu3XzQfrmtA"  }
{"type": "server", "timestamp": "2025-03-03T13:32:47,732Z", "level": "INFO", "component": "o.e.x.m.i.d.DeploymentManager", "cluster.name": "dev-purple", "node.name": "purple-master-0", "message": "[my-elser-model] Starting model deployment of model [.elser_model_2]", "cluster.uuid": "tFP7DsTIR-OYrg24tRT_WA", "node.id": "dyWTELgFRsueu3XzQfrmtA"  }
{"type": "server", "timestamp": "2025-03-03T13:32:47,656Z", "level": "ERROR", "component": "o.e.x.m.i.p.p.PyTorchResultProcessor", "cluster.name": "dev-purple", "node.name": "purple-master-0", "message": "[my-elser-model] Error processing results", "cluster.uuid": "tFP7DsTIR-OYrg24tRT_WA", "node.id": "dyWTELgFRsueu3XzQfrmtA" ,
"stacktrace": ["org.elasticsearch.xcontent.XContentEOFException: [3:1] Unexpected end of file",
"at org.elasticsearch.xcontent.provider.json.JsonXContentParser.nextToken(JsonXContentParser.java:62) ~[?:?]",
"at org.elasticsearch.xpack.ml.process.ProcessResultsParser$ResultIterator.hasNext(ProcessResultsParser.java:70) ~[?:?]",
"at org.elasticsearch.xpack.ml.inference.pytorch.process.PyTorchResultProcessor.process(PyTorchResultProcessor.java:105) ~[?:?]",
"at org.elasticsearch.xpack.ml.inference.deployment.DeploymentManager.lambda$startDeployment$2(DeploymentManager.java:180) ~[?:?]",
"at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:956) ~[elasticsearch-8.17.2.jar:?]",
"at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144) ~[?:?]",
"at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642) ~[?:?]",
"at java.lang.Thread.run(Thread.java:1575) ~[?:?]",
"Caused by: com.fasterxml.jackson.core.io.JsonEOFException: Unexpected end-of-input: expected close marker for Array (start marker at [Source: (FileInputStream); line: 2, column: 1])",
" at [Source: (FileInputStream); line: 3, column: 1]",
"at com.fasterxml.jackson.core.base.ParserMinimalBase._reportInvalidEOF(ParserMinimalBase.java:585) ~[?:?]",
"at com.fasterxml.jackson.core.base.ParserBase._handleEOF(ParserBase.java:535) ~[?:?]",
"at com.fasterxml.jackson.core.base.ParserBase._eofAsNextChar(ParserBase.java:552) ~[?:?]",
"at com.fasterxml.jackson.core.json.UTF8StreamJsonParser._skipWSOrEnd2(UTF8StreamJsonParser.java:3135) ~[?:?]",
"at com.fasterxml.jackson.core.json.UTF8StreamJsonParser._skipWSOrEnd(UTF8StreamJsonParser.java:3105) ~[?:?]",
"at com.fasterxml.jackson.core.json.UTF8StreamJsonParser.nextToken(UTF8StreamJsonParser.java:716) ~[?:?]",
"at org.elasticsearch.xcontent.provider.json.JsonXContentParser.nextToken(JsonXContentParser.java:59) ~[?:?]",
"... 7 more"] }
{"type": "server", "timestamp": "2025-03-03T13:32:49,752Z", "level": "INFO", "component": "o.e.n.Node", "cluster.name": "dev-purple", "node.name": "purple-master-0", "message": "stopping ...", "cluster.uuid": "tFP7DsTIR-OYrg24tRT_WA", "node.id": "dyWTELgFRsueu3XzQfrmtA"  }
{"type": "server", "timestamp": "2025-03-03T13:32:49,753Z", "level": "INFO", "component": "o.e.c.f.AbstractFileWatchingService", "cluster.name": "dev-purple", "node.name": "purple-master-0", "message": "shutting down watcher thread", "cluster.uuid": "tFP7DsTIR-OYrg24tRT_WA", "node.id": "dyWTELgFRsueu3XzQfrmtA"  }
{"type": "server", "timestamp": "2025-03-03T13:32:49,780Z", "level": "ERROR", "component": "o.e.x.m.p.l.CppLogMessageHandler", "cluster.name": "dev-purple", "node.name": "purple-master-0", "message": "[controller/17854] [CDetachedProcessSpawner.cc@193] Child process with PID 18123 was terminated by signal 9", "cluster.uuid": "tFP7DsTIR-OYrg24tRT_WA", "node.id": "dyWTELgFRsueu3XzQfrmtA"  }
  • It's worth calling out that the .ml-inference-native-000002 index assigned to inference_config.text_expansion.vocabulary.index is empty ... is that expected?

I can provide further logs if that will help troubleshoot the issue. Was going to test on a non-ARM64 architecture to see if the problem still arises

Thanks in advance,
Bryan W.

Just a quick update with some further findings

In /var/log/kern.log, I see that the pytorch_inferen process ran out of memory

kernel: [18892.437612] Out of memory: Killed process 18473 (pytorch_inferen) total-vm:26257684kB, anon-rss:14254212kB, file-rss:2048kB, shmem-rss:0kB, UID:10001 pgtables:28244kB oom_score_adj:667

The node has 32GB of memory, 50% of which is given to the Elasticsearch JVM process.

I had already set vm.max_map_count=262144 on the node.

I find this surprising since I'm just running a simple inference command and the node is doing nothing else (I just spun it up for ML testing purposes). I see a closed Github issue that is referencing pytorch_inference OOM issues on x86_64 architectures. It's linked to a merged PR tagged for 8.16 release.

Any guidance would be appreciated. Are there any configuration or ML settings I should try?

I tested the same steps on a comparable Amazon EC2 x86_64-based instance without any issues ... so seems the problem is limited to ARM64-based architectures

Not sure if this is known or just not very well documented at the moment