Loading BERT Model

Cole_Crawford · March 15, 2023, 12:39am

I am trying to add ANN semantic search to an Elasticsearch index of scientific documents. To that end, I am trying to set up an NLP pipeline on Elasticsearch to vectorize documents on ingest. I would like to test allenai/scibert_scivocab_uncased · Hugging Face. I am getting this error:

Traceback (most recent call last):
  File "/usr/local/bin/eland_import_hub_model", line 197, in <module>
    tm = TransformerModel(args.hub_model_id, args.task_type, args.quantize)
  File "/usr/local/lib/python3.10/site-packages/eland/ml/pytorch/transformers.py", line 551, in __init__
    self._traceable_model = self._create_traceable_model()
  File "/usr/local/lib/python3.10/site-packages/eland/ml/pytorch/transformers.py", line 661, in _create_traceable_model
    model = _DPREncoderWrapper.from_pretrained(self._model_id)
  File "/usr/local/lib/python3.10/site-packages/eland/ml/pytorch/transformers.py", line 385, in from_pretrained
    if is_compatible():
  File "/usr/local/lib/python3.10/site-packages/eland/ml/pytorch/transformers.py", line 379, in is_compatible
    has_architectures = len(config.architectures) == 1
TypeError: object of type 'NoneType' has no len()

I'm surprised by this as this makes it look like it's not compatible with Elasticsearch's ML capabilities even though it should fit these criteria (BERT architecture): Compatible third party NLP models | Machine Learning in the Elastic Stack [8.6] | Elastic. Anyone experienced something similar?

Also happy for other model recommendations - my corpus is a set of drug labels, so was thinking to use MedBERT or SciBERT.

dkyle · April 3, 2023, 1:11pm

Hi @Cole_Crawford

The eland_import_hub_model does some work to figure out how to configure the model correctly for upload to Elasticsearch. The code path you hit is missing a None check. Once I added it in I was able to upload SciBERT with the following command.

eland_import_hub_model \
      --url http://localhost:9200/ \
      -u XXXX -p XXXX \
      --hub-model-id allenai/scibert_scivocab_uncased \
      --task-type text_embedding --insecure

Obviously update the --url and auth (-u, -p) params for your cluster.

The fix is in this PR [NLP] Prevent TypeError with None check by davidkyle · Pull Request #525 · elastic/eland · GitHub.
Thanks for reporting the issue.

system · May 1, 2023, 1:12pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Eland import has IndexError Elasticsearch elastic-stack-machine-learning	2	201	March 30, 2024
Error loading model to ElasticSearch Elasticsearch elastic-stack-machine-learning , docker	4	558	November 8, 2023
Eland_import_hub_model import error! Elasticsearch elastic-stack-alerting	2	809	June 27, 2023
Error on uploading ML model to Elasticsearch Elasticsearch	1	399	January 5, 2023
Problems when importing ml models with eland_import_hub_model Elasticsearch	1	927	May 13, 2022

Loading BERT Model

Related topics