Hi,
I am currently working on a web search engine just like Google for my web site. I have crawled my website pages and indexed data in Elasticsearch. the indexed documents have such fields:
string Id
string Url
string Title
string Body
string MetaKeywords
string MetaDescription
List<string> H1
List<string> H2
List<string> H3
List<string> H4
List<string> H5
List<string> H6
int Depth // the page depth in the website: usually, if this number is less, the page is more relevant
I was looking for a query that gets an array of words as input and returns the most related pages.
I will be very grateful if you use techniques that will improve the results like boosting, or what ever you know.
Also, I slightly prefer the words to be "and" with each other. for example if the input was ["foo", "bar", "baz"]
the returned results should contain all these words and not just one of them (or at least the result should have most of the words). I suppose this behavior is more similar to the current search engines to prevent showing unrelated pages.
I know this question is too general and does not have a single correct answer but I just wanted to learn options and features that I can use.
Thanks a lot.