Embedding Images in Data Ingestion

Hello,
Recently I attended a Vector & AI workshop where I delved into the intricacies of vector embeddings. However, a demo on image relevance search caught my attention. Now, my goal is to embed images directly into our data pipeline during the ingestion process.

My primary query revolves around the logistics of embedding images during ingestion. As I embark on creating an inference pipeline, a fundamental question arises: how do I ingest the image in the first place? Is encoding the bytes to base64, akin to how HTML files and PDFs are handled with the attachment processor, the optimal approach?

Initially, I turned to the guidance provided in the "How to implement image similarity search in Elasticsearch" blog post. However, the suggested method involves embedding images using a Python script, which doesn't align with our current cloud/self-managed deployment setup. We're keen on embedding images directly during data ingestion, avoiding the need for additional backend processing.

Thus, my query persists: Is there a viable method to embed images during ingestion without relying on an external backend? Any insights or alternative methodologies would be greatly appreciated as we strive for a smoother integration of image embedding within our data pipeline.

Thanks,
Chenko

I received an "answer" to this in another webinar, the answer was something along the lines of: We do not support this. (yet?) I do not know if this will be supported however currently this is not supported.

My understanding is that there's nothing yet in the making. But I'd say that some part of the needed infrastructure for this is there (_inference API). I believe this will happen in the future but I'd not expect anything in the short term.

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.