For search-as-type scenario with 10 million very simple documents what is the recommended approach

jimisdrpc · April 9, 2020, 10:11pm

I have this requirement: create SEARCH-AS-TYPE solution with Spring and Elasticsearch couple. There will be one 10 million documents on Elasticsearch. Each document is really simple json like {“id”:9999, “FullName”}. After the user type the third letter I have to filter. the user can keep typing and the result should short. According to https://www.elastic.co/guide/en/elasticsearch/reference/7.6/analysis-edgengram-tokenizer.html “ … When you need search-as-you-type for text which has a widely known order, such as movie or song titles, the completion suggester is a much more efficient choice than edge N-grams. Edge N-grams have the advantage when trying to autocomplete words that can appear in any order…” As far as I can see, my requirement fills the demand of “… trying to autocomplete words that can appear in any order” so I would conclude that Edge N-grams is my best bet. Nevertheless I read also from https://www.elastic.co/guide/en/elasticsearch/reference/7.6/search-as-you-type.html “… The search_as_you_type field type is a text-like field that is optimized to provide out-of-the-box support for queries that serve an as-you-type completion use case.”. that said, I am confused. Based on my business requirement which approach would you favor? Is search_as_you_type field datatype complementary to Edge N-grams tokenizer? Can use both approach or should I pick up one of them? Would you favor even a third approach? Any tip will be highly appreciated.
To summarize, what is the recommended rule of thumb for this cases:

user type three letters. I have to bring all names that contain such three letters no matter in beggin, middle or end of the name. While the user keep typing the result list will reduce.

system · May 7, 2020, 10:12pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.