How to apply the right stopwords langage depending on the field of a document?

minikali · August 29, 2022, 11:41am

Let's say I have an index of News with many documents of different langages. Users have one search bar to look for the title they desire among the different langages. I don't know which langage the user is typing so I would like to remove stopwords depending on the document langage.

This is how I tried to define my index:

{
    name: "news",
    mappings: {
      dynamic: false,
      properties: {
        id: { type: "keyword" },
        title: { type: "text" },
        content: { type: "text" },
        lang: { type: "keyword" },
        created_at: { type: "date" },
        updated_at: { type: "date" },
      },
    },
    settings: {
      analysis: {
        analyzer: {
          default: {
            type: "custom",
            tokenizer: "standard",
            filter: ["my_custom_stop_words_filter"],
          },
        },
        filter: {
          my_custom_stop_words_filter: {
            type: "stop",
            stopwords: "_english_",
            script: {
              source: "lang.getText() === 'english'",
            },
          },
        },
      },
    }

I want to be able to filter out the right stopwords for every documents depending on the field "lang" of each document. How can I achieve this ?

RabBit_BR · August 29, 2022, 4:31pm

Hi @minikali

If the user is searching through the browser, you know the language through the browser itself. Is this not possible for you?

Today I do this, I mapped a field for each language and use the specific analyzer for each language.
Would be like this:

   "title": {
         "type": "text",
         "fields": {
           "pt-br": {
             "type": "text",
             "analyzer": "brazilian"
           },
           "fr": {
             "type": "text",
             "analyzer": "french"
           },
           "en": {
             "type": "text",
             "analyzer": "english"
           },
       }

When I do a search, I know the language because I search the browser and set up the query so that the search is on the field that represents the browser's language.

                      "multi_match":{
                         "query":"test",
                         "fields":[
                            "description.pt-br",
                            "title.pt-br"
                         ]

This works fine with me.

system · September 26, 2022, 4:31pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Analizer with stop words removal by language Elasticsearch	5	464	July 6, 2017
Using differents analysers based on the document language Elasticsearch	2	327	July 6, 2017
Problem with stopword filter, SimpleQueryStringQuery and default operator AND Elasticsearch	1	769	April 2, 2019
Stopwords in analyzer doesn't seem to work Elasticsearch	3	384	June 26, 2020
MultiLanguage Index - StopWords Elasticsearch	1	566	July 6, 2017

How to apply the right stopwords langage depending on the field of a document?

Related topics