Data Types:
Title: string
Keywords: string
Description: string
Search Requirements:
1. ALL search terms MUST be found somewhere in the title or keywords or description
a) Don’t search the whole DOCUMENT for terms… but search SEVERAL FIELDS to find a match for the terms.
i] All the fields don’t need to contain all the search terms, but all the search terms must be found
b) EG: searching for “bake cake” should match:
i] title contains “cake” and description contains “bake” so that BOTH terms are found
c) Because atlas search grammar did not allow us to do that, we created a concat field: title + keywords into a NEW field
d) The biggest problem we have seems to follow from the fact that we cannot get a good AND clause on our search terms
i] “bake cake” should match “bake” AND “cake”, instead, using Atlas search without our hacks we will get
ii] “cake cake cake cake” and that ranks higher than “cake bake”!
2. Field boosting is important
a) Matching in the “title” should rank higher than matching in the “description”
3. Word order matters
a) “bake cake” should boost results where “bake” precedes “cake”
b) A perfect match is preferred
4. Keyword stuffing should be penalised
a) After computing a score, the score should be un-boosted based on frequency of matching the search terms
b) “bake cake” should rank “bake a cake” HIGHER than “bake a cake bake a cake cake cake cake”
5. Shorter data should rank higher
a) “cake” as a search term should rank
i] “cake fun” higher than
ii] “cake and fun and baking and frosting”
iii] Because all else equal, the first string is SHORTER than the second string
6. Stemming matters (plurals in particular)
a) this currently does NOT work in our current implementation b/c we had to hack around Atlas to get proper AND matching for search terms