Text embedding

Amphagory · October 3, 2024, 12:36am

I was wondering if it was possible to create a text embedding pipeline where I would have a list of text (i.e. [“text1”, “text2”, “text3”, …]) and the output would be a list of its individual embedding (i.e. [ [embedded vectors1], [embedded vectors2], [embedded vectors3], etc...]).

NOTE: The list of text will be of variable length for each document.

If not, can someone suggest alternatives?

M.

carly.richmond · October 3, 2024, 10:54am

Hi @Amphagory,

Welcome back! Are you wanting to generate an embedding using a pre-loaded model in Elastic? It's not something I've tried, but I believe it could be done using a for_each processor invoking an inference processor. I'm not sure how well it will scale so it might be worth testing out on a small subset first.

Hope that helps!

Amphagory · October 3, 2024, 1:02pm

Hello!

I think you hit it on the nail. I would rather use a pre-loaded model in Elastic, but open to other solutions.

I'll look into the two processors you speak of and try to add that into my ingestion pipeline processor to see if it works.

Mike

Topic		Replies	Views
ELSER ingest pipeline with ingest processor Elasticsearch ingest-pipeline	1	133	April 24, 2024
Embedding Images in Data Ingestion Elasticsearch elastic-stack-machine-learning	3	308	April 3, 2024
How to use ingest pipelines and processors Kibana ingest-pipeline	11	528	September 1, 2021
Ingest Pipeline for Transformations Elasticsearch	7	404	August 10, 2020
ELSER - use the model outside of a pipeline Elasticsearch	4	536	September 27, 2023

Text embedding

Related topics