Choosing the similarity measure for more_like_this queries

kyrre · August 28, 2015, 7:07am

I want to specify the similarity measure used in querying for similar documents. My initially attempt was to specify this in the index's mappings section:

'mappings': {
            'document': {
                    'properties': {
                        'data_set_identifier': {
                            'type': 'string',
                            'index': 'not_analyzed'
                        },
                        'label': {
                            'type': 'string',
                            'index': 'not_analyzed'
                        },
                        'text': {
                            'type': 'string',
                            'index': 'analyzed',
                            'analyzer': 'english',
                            'similarity': 'BM25'
                        },
                        'y': {
                            'type': 'long',
                            'index': 'not_analyzed'
                        }
                    }
                }
        }

...

but this seems to have no effect (i.e., the scores are identically if I change the similarity measure).

From the Lucene documentation it seems that it should be possibly to specify which similarity measure to use, but there is no mention of this feature in the ES docs.

Is it possible?

kyrre · August 28, 2015, 9:27am

So I went through the the Lucence source and it turns out that only similarity you can use is the TFIDFSimilarity.