ELSER - use the model outside of a pipeline

Hi all,

We are exploring the new ELSER features and how we integrate this into our application. We have a two stage ingestion approach that adds the text during the second phase. Therefore, using an ingestion pipeline is not ideal as from reading it looks like we would need to reindex to get the pipeline to fire on documents (unless you can get a pipeline to fire when a document is updated?).

Is it possible to use the model directly either via an API or directly in Python (like other Huggingface models).

Alternatively I guess we could invoke the simulate API on the pipeline and then update our documents with results of that - however, that feels like a bit of a hack.

Thanks in advance.

-P

Hi @paulmaker Welcome to the community...

Can you explain Phase 2 a little better....

My initial thought is you could set a default pipeline for the index ...

That would check if your text field exists and if so then run the ELSER inference processor

Phase 1 field does not exist so does not execute

Phase 2 text field exists so the inference processor runs

That is just the initial thought.. I think this would address, as an update is really just a soft delete and index.

"unless you can get a pipeline to fire when a document is updated?."

Give it a try and report back...

Ohh There is a direct API I just saw so looks you can call directly...

So now you have 2 options

Thanks for the reply, I have got this all working.

What are the plans to increase the size of the text that can be processed (current limit is 512 tokens). Have engineering considered chunking the text and then accumulating the feature scores (average of max).

I understand engineering is / will be continuing to add features and capabilities but I can not comment on them at this time / we don't communicate future features on this forum.

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.