Eland import has IndexError

Hi, I am trying to load an NLP model into Elastic 8.8.2 using the guide here
Running the following commend gives the error stack below

eland_import_hub_model --url http://localhost:9200 --hub-model-id FacebookAI/xlm-roberta-large-finetuned-conll03-english --task-type ner

I have successfully imported several other models and as this is a XLM-RoBERTa model I expected it to be compatible. Anyone know if this is a known issue and if there is a workaround?

2024-02-13 04:49:11,540 INFO : Establishing connection to Elasticsearch
2024-02-13 04:49:11,543 INFO : Connected to cluster named 'elasticsearch' (version: 8.8.2)
2024-02-13 04:49:11,544 INFO : Loading HuggingFace transformer tokenizer and model 'FacebookAI/xlm-roberta-large-finetuned-conll03-english'
Some weights of the model checkpoint at FacebookAI/xlm-roberta-large-finetuned-conll03-english were not used when initializing XLMRobertaForTokenClassification: ['roberta.pooler.dense.weight', 'roberta.pooler.dense.bias']
- This IS expected if you are initializing XLMRobertaForTokenClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing XLMRobertaForTokenClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
WARNING:2024-02-13 04:49:26 998352:998352 init.cpp:111] function cbapi.getCuptiStatus() failed with error CUPTI_ERROR_NOT_INITIALIZED (15)
WARNING:2024-02-13 04:49:26 998352:998352 init.cpp:112] CUPTI initialization failed - CUDA profiler activities will be missing
INFO:2024-02-13 04:49:26 998352:998352 init.cpp:114] If you see CUPTI_ERROR_INSUFFICIENT_PRIVILEGES, refer to https://developer.nvidia.com/nvidia-development-tools-solutions-err-nvgpuctrperm-cupti
STAGE:2024-02-13 04:49:26 998352:998352 ActivityProfilerController.cpp:294] Completed Stage: Warm Up
STAGE:2024-02-13 04:49:26 998352:998352 ActivityProfilerController.cpp:300] Completed Stage: Collection
Traceback (most recent call last):
  File "/home/user/nlp_test/venv/bin/eland_import_hub_model", line 8, in <module>
    sys.exit(main())
  File "/home/user/nlp_test/venv/lib/python3.10/site-packages/eland/cli/eland_import_hub_model.py", line 254, in main
    tm = TransformerModel(
  File "/home/user/nlp_test/venv/lib/python3.10/site-packages/eland/ml/pytorch/transformers.py", line 657, in __init__
    self._config = self._create_config(es_version)
  File "/home/user/nlp_test/venv/lib/python3.10/site-packages/eland/ml/pytorch/transformers.py", line 790, in _create_config
    per_allocation_memory_bytes = self._get_per_allocation_memory(
  File "/home/user/nlp_test/venv/lib/python3.10/site-packages/eland/ml/pytorch/transformers.py", line 855, in _get_per_allocation_memory
    self._traceable_model.model(*inputs_1)
  File "/home/user/nlp_test/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/user/nlp_test/venv/lib/python3.10/site-packages/transformers/models/xlm_roberta/modeling_xlm_roberta.py", line 1402, in forward
    outputs = self.roberta(
  File "/home/user/nlp_test/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/user/nlp_test/venv/lib/python3.10/site-packages/transformers/models/xlm_roberta/modeling_xlm_roberta.py", line 839, in forward
    embedding_output = self.embeddings(
  File "/home/user/nlp_test/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/user/nlp_test/venv/lib/python3.10/site-packages/transformers/models/xlm_roberta/modeling_xlm_roberta.py", line 131, in forward
    position_embeddings = self.position_embeddings(position_ids)
  File "/home/user/nlp_test/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/user/nlp_test/venv/lib/python3.10/site-packages/torch/nn/modules/sparse.py", line 160, in forward
    return F.embedding(
  File "/home/user/nlp_test/venv/lib/python3.10/site-packages/torch/nn/functional.py", line 2210, in embedding
    return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
IndexError: index out of range in self

Hi!

This model was actually added in 8.9 so you would need to update Elastic. See the PR here: [ML] add support for xlm_roberta tokenized models by benwtrent · Pull Request #94089 · elastic/elasticsearch · GitHub

Current supported models will be updated & listed on this doc page.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.