Choosing the similarity measure for more_like_this queries


(Kyrre) #1

I want to specify the similarity measure used in querying for similar documents. My initially attempt was to specify this in the index's mappings section:

'mappings': {
            'document': {
                    'properties': {
                        'data_set_identifier': {
                            'type': 'string',
                            'index': 'not_analyzed'
                        },
                        'label': {
                            'type': 'string',
                            'index': 'not_analyzed'
                        },
                        'text': {
                            'type': 'string',
                            'index': 'analyzed',
                            'analyzer': 'english',
                            'similarity': 'BM25'
                        },
                        'y': {
                            'type': 'long',
                            'index': 'not_analyzed'
                        }
                    }
                }
        }

...

but this seems to have no effect (i.e., the scores are identically if I change the similarity measure).

From the Lucene documentation it seems that it should be possibly to specify which similarity measure to use, but there is no mention of this feature in the ES docs.

Is it possible?


(Kyrre) #2

So I went through the the Lucence source and it turns out that only similarity you can use is the TFIDFSimilarity.


(system) #3