Partial match with ngram. How to avoid\minimize excessive search result

Hi,
I have a task to implement search among some records by some different fields.
One of them - "Title" and regarding this specific field there are few demands:

  • internal match (e.g. "esti" in search request should match "Testing" value);
  • not strict match (e.g. "Tosting" in search request should match "Testing" value);

Another demand regarding search overall is that by default search output have to be displayed in chronological order (the later documents is - the better, there is a "CreationTime" field in document)

I have implemented first part (both internal and non-strict matches) using ngram token filter.
Second part (chronological order), implemented by using "sort" feature in search query.
My problem is that search on ngrammed field outputs to many non-relevant results
For example if request was "Computer" then output contains documents with Titles

  • "Computer"
  • "Compressor"
  • "Company"

Those non-relevant results, ofcourse, have lower _score then documents with "Computer" in Title. But sorting by CreationTime neutralizes that so the users often see non-relevant results at the top (because they was created later).

My overall question - how can I work around this situation?

Few ways I can tell right away:

  • Reimplement search technique to make it more "specific" (some how minimize redundancy of search output in the first place), so I can sort those results;
  • Use some kind of factorisation (using FunctionScore compound query and use some factor based on CreatedDate (e.g. days from 2012-01-01 to CreationTime) as field value factor);

I have totally failed in both directions, so I'm asking you, guys, to help me.

Now, being more specific:
Elastic Version - 6.2.2
Index Setting\Mapping:

    {
   "tendersearch":{
      "aliases":{

      },
      "mappings":{
         "_doc":{
            "properties":{
               "@timestamp":{
                  "type":"keyword"
               },
               "@version":{
                  "type":"keyword"
               },
               "DateModified":{
                  "type":"date"
               },
               "Id":{
                  "type":"keyword"
               },
               "tender":{
                  "properties":{
                     
                     "CreationTime":{
                        "type":"date"
                     },
                    
                     "Factor":{
                        "type":"long"
                     },
                     "Title":{
                        "type":"text",
                        "fields":{
                           "ngram":{
                              "type":"text",
                              "analyzer":"trigrams"
                           },
                           "raw":{
                              "type":"keyword"
                           }
                        },
                        "analyzer":"word_delim_analyzer"
                     }                     
                  }
               }
            }
         }
      },
      "settings":{
         "index":{
            "number_of_shards":"1",
            "provided_name":"tendersearch",
            "max_result_window":"2147483647",
            "creation_date":"1522832508985",
            "analysis":{
               "filter":{
                  "trigrams_filter":{
                     "type":"ngram",
                     "min_gram":"4",
                     "max_gram":"4"
                  },
                  "word_delim_catenate":{
                     "catenate_all":"true",
                     "type":"word_delimiter"
                  }
               },
               "analyzer":{
                  "trigrams":{
                     "filter":[
                        "lowercase",
                        "word_delim_catenate",
                        "trigrams_filter"
                     ],
                     "type":"custom",
                     "tokenizer":"whitespace"
                  },
                  "word_delim_analyzer":{
                     "filter":[
                        "lowercase",
                        "word_delim_catenate"
                     ],
                     "type":"custom",
                     "tokenizer":"whitespace"
                  }
               }
            },
            "number_of_replicas":"1",
            "uuid":"kY7S9_NhSyCgWi047v0rSA",
            "version":{
               "created":"6020199"
            }
         }
      }
   }
}

Search Request Sample:
{
"size":10,
"from":0,
"sort":[
{
"tender.CreationTime":{
"order":"desc"
}
},
"_score"
],
"query":{
"bool":{
"must":[
{
"match":{
"tender.Title.ngram":{
"query":"Computer"
}
}
}
]
}
}
}

Any help is hightly appreciated!

I'd go with a function_score query. Something like this would work:

{
  "size": 10,
  "from": 0,
  "query": {
    "function_score": {
      "query": {
        "match": {
          "tender.Title.ngram": {
            "query": "Computer"
          }
        }
      },
      "functions": [
        {
          "gauss": {
            "tender.CreationTime": {
              "origin": "now",
              "scale": "7d",
              "offset": "5d",
              "decay": 0.5
            }
          }
        }
      ]
    }
  }
}

This query will calculate a number between 0 and 1 based on how long ago the CreationTime is (based on a Gaussian function) and then multiply that number with the score of the match on the ngrammed Title field.

You can play with the parameters like scale and offset a bit to let the gauss function go to 0 faster or slower.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.