Scoring longest continuous string in documents

Alexis_Berger · May 11, 2015, 3:10pm

Hi,

I have a dataset containing about 15M documents.
Each document has one field which contains a small text 50 words at max.

I made lots of search query tests until I found the query I needed, managing stop words and missing words.
It is a function score query wrapping a common terms query, and filtering the results with a phrase match query:
(for the given example, let say that the cutoff_frequency value is right and skip stop words correctly)

GET xxxx/_search
{
"fields": ["myField"],
"from" : 0,
"size" : 10,
"query" : {
"function_score" : {
"query" : {
"bool" : {
"must" : {
"common": {
"myField": {
"query": "word1 word2 word3 word4",
"cutoff_frequency": 0.149,
"low_freq_operator": "or",
"high_freq_operator": "or",
"minimum_should_match": 3
}
}
}
}
},
"functions" : [ {
"filter" : {
"query" : {
"match" : {
"oneField.normal" : {
"query" : "word1 word2 word3 word4",
"type" : "phrase",
"slop" : 0
}
}
}
},
"weight" : 1.0
},
"score_mode" : "sum",
"boost_mode" : "sum"
}
}
}

Everything is good, except documents ranking.
The default similarity does not seem to match my needs. I need the best scores for documents that have the longest continous string (most adjacent words).
What is the best way to achieve this ranking? Using another similarity? Maybe I missed a special query in the documentation...

Any help/advice would be much appreciated!
Thanks

BTW I am using ES 1.5

Topic		Replies	Views
Optimizing a query that matches a large number of documents Elasticsearch	3	671	July 6, 2017
Score is lower if text is longer Elasticsearch	9	3851	July 6, 2017
Search Query Optimization Elasticsearch	5	627	September 3, 2018
Expecting another result(scoring) on function_score Elasticsearch	2	413	October 23, 2018
How to combine Elasticsearch function score query and text proximity scoring with weight? Elasticsearch	1	1109	February 7, 2019

Scoring longest continuous string in documents

Related topics