More Like This and shingles / phrases

Is it possible to combine shingles with a More Like This query? I added a custom shingles analyzer to my index, but MLT queries do not seem to use the shingles field. Being able to combine these is extremely important to our use case (comparing large numbers of large text documents). The default Bag of Words approach leads to too many false positives from out of context words.

In particular, setting the per_field_analyzer to my_shingles_analyzer or custom does not seem to work.

I got it to work by combining multiple More Like This queries each with their own analyzer instead of trying to use per_field_analyzer. That worked out better anyway, allowing me to have separate settings (e.g. stop_words, min_word_length) for unigrams vs bigrams.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.