Inference on an image

Hello all,

Is it possible to get a vector embedding for an image from an inference endpoint using the sentence-transformers__clip-vit-b-32-multilingual-v1 model? And if so, what format should the input be?

The documentation on the inference api tells me that the input should be a string so I suspect that this is not possible. But perhaps a base64 encoded image might work?

I'm not sure how I would test this myself, because if I give it a base64 encoded string it will give me a vector, but I have no way of knowing if the vector is for the image or the text.

Not it's not possible yet. I hope to see that happening at some point as I'd love to run inference on my audio files directly within elastic instead of writing my own Python code :wink:

Thanks for your quick reply @dadoonet !
Sounds like I'll have to create my own endpoint for this specific use case :nerd_face: