Elasticsearch + Analyzer + English dataset + French query

Hey guys !

I created an attribute with "annotated_text" to be able to recognize some entities

"headline": {"type": "annotated_text", "analyzer":"analyzer_shingle"}

Now I can for example look for "Turkey" and it returns [Turkey](Turkey & GPE)

But if for example the user type "Turquie" (which is the french translation of the word Turkey), the elasticsearch returns nothing.

I would like to know if there is an option that allows us to look for "Turquie" and returns the same entitie [Turkey](Turkey & GPE)

Here is what I get when I look for "Turkey" :

GET newsfeeds/_search
{
  "query": {
    "term": {
        "headline": "Turkey" 
    }
  }
}

I get :

{
      "took" : 11,
      "timed_out" : false,
      "_shards" : {
        "total" : 1,
        "successful" : 1,
        "skipped" : 0,
        "failed" : 0
      },
      "hits" : {
        "total" : {
          "value" : 8,
          "relation" : "eq"
        },
        "max_score" : 4.584161,
        "hits" : [
          {
            "_index" : "newsfeeds",
            "_type" : "_doc",
            "_id" : "SB12019373229680514873104586239560012766084",
            "_score" : 4.584161,
            "_source" : {
              "headline" : """Refugees Stream Across [Turkey](Turkey&GPE), Trying to Enter the [EU](EU&ORG)

    """,
              "url" : "https://www.wsj.com/articles/refugees-stream-across-turkey-in-bid-to-enter-eu-a-baby-in-an-isotherm-bag-11583319603",
              "published" : "2020-03-04T12:36:00Z",
              "feedLink" : "http://online.wsj.com/page/2_0006.html",
              "tags" : [
                "Turkey",
                "EU"
              ]
            }
          },

When I search for Turquie

, I get :

 {
      "took" : 2,
      "timed_out" : false,
      "_shards" : {
        "total" : 1,
        "successful" : 1,
        "skipped" : 0,
        "failed" : 0
      },
      "hits" : {
        "total" : {
          "value" : 0,
          "relation" : "eq"
        },
        "max_score" : null,
        "hits" : [ ]
      }
    }

Please don't post images of text as they are hard to read, may not display correctly for everyone, and are not searchable.

Instead, paste the text and format it with </> icon or pairs of triple backticks (```), and check the preview window to make sure it's properly formatted before posting it. This makes it more likely that your question will receive a useful answer.

It would be great if you could update your post to solve this.

Thank you for the reply :slight_smile:

I updated my post

Thanks.

Don't use the citation icon for code but only </> icon. I updated your post.

I guess that you'd need to use synonyms may be to tell elasticsearch that Turquie and Turkey are the same text.

You can look at this: https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-synonym-tokenfilter.html

May be that would help.

Yeah I did that before and it works but I will have to do this for all the world countries which is a little bit complicated

Yes. You might have to do some work to get that working.
The number of countries is not that important though.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.