Using More Like This query with language analysed fields

zloban · February 22, 2018, 8:39am

HI,

I have an index with million documents from all kind of media in all kind of languages.
I set up my mapping with multiple languages analyzed fields for both title and body of an article, the default field uses standard analyzer.
The problem is when I use Bulgarian analyzed field for body in MLT query, I get zero hits.
here is my query:

GET my_index/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "more_like_this": {
            "fields": [
              "title.bg",
              "body.bg"
            ],
            "like": [
              {
                "_id": 167594917 // article in Bulgarian about sport event
              }
            ],
            "min_term_freq": 10,
            "max_query_terms": 50
          }
        }
      ]
    }
  }
}

When I use the default standard analyzed filed for body I get millions of results.
I tested MLT query for article in Romanian using title.ro and body.ro I get results.

My custom bg analyzer is defined like this:

"bulgarian": {
  "filter": [
    "lowercase",
    "stop_bg",
    "bulgarian_stemmer"
  ],
    "char_filter": [
    "html_strip"
  ],
  "type": "custom",
  "tokenizer": "standard"
}

system · March 22, 2018, 8:40am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.