Elasticsearch + Analyzer + English dataset + French query

Hey guys !

I created an attribute with "annotated_text" to be able to recognize some entities

"headline": {"type": "annotated_text", "analyzer":"analyzer_shingle"}

Now I can for example look for "Turkey" and it returns [Turkey](Turkey & GPE)

But if for example the user type "Turquie" (which is the french translation of the word Turkey), the elasticsearch returns nothing.

I would like to know if there is an option that allows us to look for "Turquie" and returns the same entitie [Turkey](Turkey & GPE)

Here is what I get when I look for "Turkey" :

GET newsfeeds/_search
{
  "query": {
    "term": {
        "headline": "Turkey" 
    }
  }
}

I get :

{
      "took" : 11,
      "timed_out" : false,
      "_shards" : {
        "total" : 1,
        "successful" : 1,
        "skipped" : 0,
        "failed" : 0
      },
      "hits" : {
        "total" : {
          "value" : 8,
          "relation" : "eq"
        },
        "max_score" : 4.584161,
        "hits" : [
          {
            "_index" : "newsfeeds",
            "_type" : "_doc",
            "_id" : "SB12019373229680514873104586239560012766084",
            "_score" : 4.584161,
            "_source" : {
              "headline" : """Refugees Stream Across [Turkey](Turkey&GPE), Trying to Enter the [EU](EU&ORG)

    """,
              "url" : "https://www.wsj.com/articles/refugees-stream-across-turkey-in-bid-to-enter-eu-a-baby-in-an-isotherm-bag-11583319603",
              "published" : "2020-03-04T12:36:00Z",
              "feedLink" : "http://online.wsj.com/page/2_0006.html",
              "tags" : [
                "Turkey",
                "EU"
              ]
            }
          },

When I search for Turquie

, I get :

 {
      "took" : 2,
      "timed_out" : false,
      "_shards" : {
        "total" : 1,
        "successful" : 1,
        "skipped" : 0,
        "failed" : 0
      },
      "hits" : {
        "total" : {
          "value" : 0,
          "relation" : "eq"
        },
        "max_score" : null,
        "hits" : [ ]
      }
    }

Please don't post images of text as they are hard to read, may not display correctly for everyone, and are not searchable.

Instead, paste the text and format it with </> icon or pairs of triple backticks (```), and check the preview window to make sure it's properly formatted before posting it. This makes it more likely that your question will receive a useful answer.

It would be great if you could update your post to solve this.

Thank you for the reply :slight_smile:

I updated my post

Thanks.

Don't use the citation icon for code but only </> icon. I updated your post.

I guess that you'd need to use synonyms may be to tell elasticsearch that Turquie and Turkey are the same text.

You can look at this: https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-synonym-tokenfilter.html

May be that would help.

Yeah I did that before and it works but I will have to do this for all the world countries which is a little bit complicated

Yes. You might have to do some work to get that working.
The number of countries is not that important though.