Is there a way to use non-text fields for document similarity in ElasticSearch, such as dates or integers?


(Soufian) #1

Using the more_like_this feature of ElasticSearch, I understand how to use tf-idf or other metrics to find similar documents based on text fields. However what if my use case is more like this:

#   Name    Description    Price    Date
--------------------------------------------
1   A B C   Ba Bi Bou      100.0    12-01-18
--------------------------------------------
2   A B Z   Ba Bi Xon      250.0    01-11-11
3   X Y Z   Xa Xu Xon      100.0    12-02-18

Based on text fields only, document #2 should score higher than document #3 in terms of similarity with document #1, since document #3 would score 0. However, taking "price" and "date" into account, document #3's score should raise.

Can I do something like that with ElasticSearch? The only documentation I find deal with text fields. Adding non-text fields to the more_like_this query doesn't cause any exceptions, however scores are completely unaffected.


(Byron Voorbach) #2

As of this moment I don't think this is possible.
From the documentation:

Important : The fields on which to perform MLT must be indexed and of type text or keyword