We are exploring the new ELSER features and how we integrate this into our application. We have a two stage ingestion approach that adds the text during the second phase. Therefore, using an ingestion pipeline is not ideal as from reading it looks like we would need to reindex to get the pipeline to fire on documents (unless you can get a pipeline to fire when a document is updated?).
Is it possible to use the model directly either via an API or directly in Python (like other Huggingface models).
Alternatively I guess we could invoke the simulate API on the pipeline and then update our documents with results of that - however, that feels like a bit of a hack.
Thanks for the reply, I have got this all working.
What are the plans to increase the size of the text that can be processed (current limit is 512 tokens). Have engineering considered chunking the text and then accumulating the feature scores (average of max).
I understand engineering is / will be continuing to add features and capabilities but I can not comment on them at this time / we don't communicate future features on this forum.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.