Best way to introduce multiple new NLP models to existing indexes?

Jason_Woo · June 11, 2024, 3:43am

I am following this documentation: Add NLP inference to ingest pipelines | Machine Learning in the Elastic Stack [8.14] | Elastic and I learned that I can set up an ingest pipeline and reindex existing indexes into a new one with it.

If we have 100s of TBs of data in cluster and want to apply the model to all, is reindexing with the pipeline (and deleting the old indexes) the best approach?

And if so, and if we need to iteratively experiment with different models side-by-side, each time a new model is introduced, it would be reindexing the whole cluster with an updated ingest pipeline?

iulia · September 18, 2024, 9:07am

Hi!

You would indeed need to reindex when adding new types of embeddings.

I'd recommend using a subset of your data in a separate index for experimenting and comparing results rather than reindexing TBs of data with multiple models.

You can also add multiple models (processors) with different NLP techniques in the same pipeline so you can generate multiple embeddings for the same data in a single reindex.
Once you settle on a model, you can then reindex your entire original data, and set up the ingest pipeline to automatically add the processor on whatever new data comes in from that point onwards.

Topic		Replies	Views
Adding multiple ingest pipelines to an index Kibana	3	163	June 10, 2024
Help with pipeline and reindex and search using E5 embedding model Elasticsearch elastic-stack-machine-learning	1	53	November 7, 2024
How does we can implement ingest pipeline in runtime Elasticsearch ingest-pipeline	3	409	February 7, 2022
Use Ingest Pipeline to insert a message in more than 1 indexes Elasticsearch	2	545	April 28, 2017
Use ELSER on data already in elastic Elasticsearch elastic-stack-machine-learning	2	24	February 4, 2025

Best way to introduce multiple new NLP models to existing indexes?

Related topics