I was wondering if it was possible to create a text embedding pipeline where I would have a list of text (i.e. [“text1”, “text2”, “text3”, …]) and the output would be a list of its individual embedding (i.e. [ [embedded vectors1], [embedded vectors2], [embedded vectors3], etc...]).
NOTE: The list of text will be of variable length for each document.
If not, can someone suggest alternatives?
M.
Hi @Amphagory,
Welcome back! Are you wanting to generate an embedding using a pre-loaded model in Elastic? It's not something I've tried, but I believe it could be done using a for_each
processor invoking an inference
processor. I'm not sure how well it will scale so it might be worth testing out on a small subset first.
Hope that helps!
Hello!
I think you hit it on the nail. I would rather use a pre-loaded model in Elastic, but open to other solutions.
I'll look into the two processors you speak of and try to add that into my ingestion pipeline processor to see if it works.
Mike