Hello, there
I have the following use case:
Say I have a text field "OCR_TEXT" and I want to search documents containing the phrase "Apple Tree" and then count the number of the occurrences of the word "Apple" (but not "Apple Tree") in each match document as the score.
I tested "_termStats.termFreq()" introduced in 8.15 and it seems to only return the sum of frequencies of both "Apple" and "Tree". In earlier days (<5.0), we can access term frequency of each term but we can not do so at this moment.
{
"query": {
"script_score": {
"query":{
"bool":{
"should": [
{
"match_phrase" : {
"OCR_TEXT" : {
"query": "Apple Tree",
"slop": 0
}
}
}
]
}
},
"script": {
"source": "_termStats.termFreq().getSum()"
}
}
}
}
The example above is to use ""script_score" as a top level query. I also tested to use "script_score" under "function_score" and there are two issues to supply a different query for scoring purpose. The score returned does not make sense if we do not specify a "query" field under "script_score". An error is returned if we specify a "query" clause under "script_socre".
Can you have a look and let us know if there is a solution? Will ELK development team consider expanding the _termStats to contain a dictionary of all matched words so the user can freely use the frequency info of each word.
{
"query": {
"function_score":{
"query":{
"bool":{
"should": [
{
"match_phrase" : {
"OCR_TEXT" : {
"query": "Apple Tree",
"slop": 0
}
}
}
]
}
},
"script_score": {
"script": {
"source": "_termStats.termFreq().getSum()"
}
}
}
}
}
Thank you,
SD