while working on a project of mine (using NEST client for .NET), I realized that the total number of documents indexed is different when using an analyzer from the number of documents indexed without the analyzer.
My code for creating an index WITHOUT an analyzer looks like this:
var createIndexResponse = _elasticJsonClient.Indices.Create("jsonindex", c => c .Map<CustomControl>(m => m.AutoMap()));
My code for creating and index WITH an analyzer looks like this:
var createIndexResponse = _elasticJsonClient.Indices.Create("jsonindex", c => c .Settings(st => st .Setting(UpdatableIndexSettings.MaxNGramDiff, 18) .Analysis(an => an .Analyzers(anz => anz .Custom("ngram_analyzer", na => na .Tokenizer("ngram_tokenizer") .Filters("lowercase")) ) .Tokenizers(tz => tz .NGram("ngram_tokenizer", td => td .MinGram(4) .MaxGram(5) .TokenChars( TokenChar.Letter, TokenChar.Digit, TokenChar.Punctuation, TokenChar.Symbol ) ) ) ) ) .Map<CustomControl>(m => m.AutoMap()) );
Around 100 documents less get indexed when I use an analyzer, also when I changed minGram to 2 and maxGram to 20, I got even fewer documents.
Elasticsearch version used: 7.6.2
May I say that code for indexing documents is the same for each of the index's settings, so I'm certain that the reason for the different behaviors is the addition of an analyzer.
Thank you for any suggestions.