Confusede result from multilingual-e5-small

I am currently working with multilingual-e5-small and using semantic search. However, when retrieving search results, they seem inconclusive compared to the search query.

My actual question is how to evaluate the model more accurately with indexed embeddings. How is the search done with the tokens?

When comparing embeddings from training with those from indexing via the pipeline, they interestingly differ with the same text.

With elser_v2, there is an interesting explanation on this topic for some raised issues, Improving text expansion performance using token pruning — Search Labs (elastic.co) but for other models, it's somewhat unclear.

I currently don't work with the proposed language for elser_v2

abra cadabra?

Inputs to the E5 family of models should be prefixed with either query: or passage: this is how the model was trained (see the FAQ on HuggingFace).

Elasticsearch automatically adds the passage: prefix to inputs as they are ingested and query: to search inputs. This would explains the different embedding values you are seeing.

How is the search done with the tokens?

Search is performed in the vector database. The query text is converted to an embedding then the vector database is used to find other embeddings that are close to the search embedding.

Regarding the use of the 'query' and 'passage' prefixes, I wasn't aware that Elasticsearch already adds these automatically—perhaps I missed that part of the documentation.

Now, about the issue with vectors converted by Elasticsearch for search, I have a specific scenario in mind. Taking a demonstrated example of products:

When searching for "shirts," it sometimes returns "sneakers"—terms that have no relation to the search. I believe this is related to the FAQ:

2. Why are my reproduced results slightly different from reported in the model card?
Different versions of transformers and pytorch could cause negligible but non-zero performance differences

Given that it will return 'random' information like this...

For textual search, Elasticsearch handles search relevance using approaches like TF-IDF, which makes it easy to identify the relevance of a document.

But with vectors, how does it calculate the score so that "sneakers" don't appear in my search for "shirts"?

How can Elasticsearch resolve this in its semantic search?

healthy?