Hi everyone. I'm having a little confusion with the general flow of constructing custom analyzers and mappings, during document indexing time and at query time.
The reason I'm asking all of this is that I'm trying to construct my own analyzer using n-grams/shingles as token filters along with other character filters so that I can perform full-text queries even if user's input is full of typo error or local slangs. Hence, the query would be a phrase/sentence level where the sequence might just be very important!
-
How does the "analyzer" work? I often see tokenizers and filters being specified at the same level as the "analyzer" level, yet within the "analyzer, it already has a nested tokenizer and filter specified.
-
Mappings come after the setting variable. I'm confused as to why I often see articles saying one can specify analyzer within the mapping. Is this necessary even after creating my custom analyzer earlier?
-
Is it correct to say that the above custom analyzer will therefore automatically be used when I add documents to Elasticsearch? meaning the analyzer is used at index time? if not do I have to specify something when adding the documents?
-
Does this custom_analyzer automatically apply to my search queries as well?
-
I read in the documentation that if were to use shingles token filter, I shouldn't apply a stop_word token filter. Could someone advise me on this.
Thank you in advance!