I am trying to add ANN semantic search to an Elasticsearch index of scientific documents. To that end, I am trying to set up an NLP pipeline on Elasticsearch to vectorize documents on ingest. I would like to test allenai/scibert_scivocab_uncased · Hugging Face. I am getting this error:
Traceback (most recent call last):
File "/usr/local/bin/eland_import_hub_model", line 197, in <module>
tm = TransformerModel(args.hub_model_id, args.task_type, args.quantize)
File "/usr/local/lib/python3.10/site-packages/eland/ml/pytorch/transformers.py", line 551, in __init__
self._traceable_model = self._create_traceable_model()
File "/usr/local/lib/python3.10/site-packages/eland/ml/pytorch/transformers.py", line 661, in _create_traceable_model
model = _DPREncoderWrapper.from_pretrained(self._model_id)
File "/usr/local/lib/python3.10/site-packages/eland/ml/pytorch/transformers.py", line 385, in from_pretrained
if is_compatible():
File "/usr/local/lib/python3.10/site-packages/eland/ml/pytorch/transformers.py", line 379, in is_compatible
has_architectures = len(config.architectures) == 1
TypeError: object of type 'NoneType' has no len()
I'm surprised by this as this makes it look like it's not compatible with Elasticsearch's ML capabilities even though it should fit these criteria (BERT architecture): Compatible third party NLP models | Machine Learning in the Elastic Stack [8.6] | Elastic. Anyone experienced something similar?
Also happy for other model recommendations - my corpus is a set of drug labels, so was thinking to use MedBERT or SciBERT.