Loading BERT Model

I am trying to add ANN semantic search to an Elasticsearch index of scientific documents. To that end, I am trying to set up an NLP pipeline on Elasticsearch to vectorize documents on ingest. I would like to test allenai/scibert_scivocab_uncased · Hugging Face. I am getting this error:

Traceback (most recent call last):
  File "/usr/local/bin/eland_import_hub_model", line 197, in <module>
    tm = TransformerModel(args.hub_model_id, args.task_type, args.quantize)
  File "/usr/local/lib/python3.10/site-packages/eland/ml/pytorch/transformers.py", line 551, in __init__
    self._traceable_model = self._create_traceable_model()
  File "/usr/local/lib/python3.10/site-packages/eland/ml/pytorch/transformers.py", line 661, in _create_traceable_model
    model = _DPREncoderWrapper.from_pretrained(self._model_id)
  File "/usr/local/lib/python3.10/site-packages/eland/ml/pytorch/transformers.py", line 385, in from_pretrained
    if is_compatible():
  File "/usr/local/lib/python3.10/site-packages/eland/ml/pytorch/transformers.py", line 379, in is_compatible
    has_architectures = len(config.architectures) == 1
TypeError: object of type 'NoneType' has no len()

I'm surprised by this as this makes it look like it's not compatible with Elasticsearch's ML capabilities even though it should fit these criteria (BERT architecture): Compatible third party NLP models | Machine Learning in the Elastic Stack [8.6] | Elastic. Anyone experienced something similar?

Also happy for other model recommendations - my corpus is a set of drug labels, so was thinking to use MedBERT or SciBERT.

Hi @Cole_Crawford

The eland_import_hub_model does some work to figure out how to configure the model correctly for upload to Elasticsearch. The code path you hit is missing a None check. Once I added it in I was able to upload SciBERT with the following command.

eland_import_hub_model \
      --url http://localhost:9200/ \
      -u XXXX -p XXXX \
      --hub-model-id allenai/scibert_scivocab_uncased \
      --task-type text_embedding --insecure

Obviously update the --url and auth (-u, -p) params for your cluster.

The fix is in this PR [NLP] Prevent TypeError with None check by davidkyle · Pull Request #525 · elastic/eland · GitHub.
Thanks for reporting the issue.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.