Recently we have changed Elasticsearch version from 2.4 to 5.4 .
we found one issue in more like this query in version 5.x .
following query is used to find out similar documents by text
INPUT Query
POST /test/_search
{
"size": 10000,
"stored_fields": [
"docid"
],
"_source": false,
"query": {
"more_like_this": {
"fields": [
"textcontent"
],
"like": [
{
"_index": "test",
"_type": "object",
"_id": "AV0c9jvZXF-b5U5aNAWB"
}
],
"max_query_terms": 5000,
"min_term_freq": 1,
"min_doc_freq": 1
}
}
}
Output of Elasticsearch 2.4
{
"took": 16,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"failed": 0
},
"hits": {
"total": 3,
"max_score": 1.5381224,
"hits": [
{
"_index": "test",
"_type": "object",
"_id": "AVzjOOdilllQ-Gyal6Z9",
"_score": 1.5381224,
"fields": {
"docid": [
"2"
]
}
}, {
"_index": "test",
"_type": "object",
"_id": "AVzjOOdilllQ-Gyal63Z",
"_score": .5381224,
"fields": {
"docid": [
"3"
]
}
}, {
"_index": "test",
"_type": "object",
"_id": "AVzjOOdilllQ-Gyal6Z",
"_score": .381224,
"fields": {
"docid": [
"4"
]
}
}
Output of Elasticsearch 5.4
{
"took": 16,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"failed": 0
},
"hits": {
"total": 3,
"max_score": 1.5381224,
"hits": [
{
"_index": "test",
"_type": "object",
"_id": "AVzjOOdilllQ-Gyal6Z9",
"_score": 168.5381224,
"fields": {
"docid": [
"2"
]
}
}, {
"_index": "test",
"_type": "object",
"_id": "AVzjOOdilllQ-Gyal63Z",
"_score": 164.5381224,
"fields": {
"docid": [
"3"
]
}
}, {
"_index": "test",
"_type": "object",
"_id": "AVzjOOdilllQ-Gyal6Z",
"_score": 132.381224,
"fields": {
"docid": [
"4"
]
}
}}
The output is same in both versions except the score of the documents. version 5.4 is giving more score than 2.4. We are dependent on score for our work so if the score changes then its a problem for us. Please provide solution for this?