I'm trying to create a phrase suggestor query for a corpus of a mix of german and english (thanks marketing guys) words, but the people using the search will be german, hence expecting primarily german suggestions.
I'm currently stumped wrt how to configure the phrase suggestor to be more "natural" in its suggestions. For example, we have a bunch of documents with the word "Stift" (writing utensil) in them. Using the examples in the phrase suggestor documentation search for "Stif" (missing t at the end) results in the suggestion of "Star" (Levenshtein distance 2) instead of the more natural "Stift" (Levenshtein distance 1).
I've tried adding combinations of the german normalization and german stemmer to the trigram analyzer, but then I get a suggestion of "seit" (Levenshtein distance 3).
I've also tried changing the smoothing, gram size and confidence parameters, with no change in the results.
Any pointers as to how to get the suggestor to prefer the shorter distances?