I had a few questions after reading this article https://www.elastic.co/blog/text-similarity-search-with-vectors-in-elasticsearch
This seems to assume the documents in the corpus have been summarized by virtue of having a "title" or a "question" that the document is relevant to.
-
What if I wanted to use embeddings based search on a corpus where documents do not have titles? Do I thencompare the query vector to the whole document vector? Do I compare the query vector to the vectors of each sentence, in each document? Would these sentence vector comparisons still be effective with documents that are at least a paragraph long (and not concise questions / titles)
-
Furthermore, what if the document is relevant to the query, but no particular sentence in the document answers the query on its own. For example, query is "Queen Elizabeth birth day". Document is "Queen elizabeth is is the queen of england .... few irrelevant sentences ... SHE was born in xxxx". This document answers the query. However two lines together contain the answer to the query (one sentence telling us the birth day, another sentence placed far apart telling us who is this person whose birthday is given)