How to stem, remove stop words from my index data?

I want to stem, remove stop words from my index data (_doc/1).
My analyzer, mapping is:
> PUT my_index

{
  "settings": {
    "index":{
      "analysis":{
        "filter":{
          "synonym":{
            "type":"synonym",
            "synonyms_path":"C:/Users/ALMUG/Downloads/elasticsearch-7.1.0/config/synonym.txt"
          },
          "english_stop":{
            "type":"stop",
            "stopwords":"_english_"
          },
          "my_stemmer":{
            "type":"stemmer",
            "name":"english"
          }
        },
        "analyzer":{
          "my_custom_analyzer":{
            "type":"custom",
            "tokenizer":"standard",
            "filter":[
              "lowercase",
              "english_stop",
              "synonym",
              "my_stemmer"
              ]
          }
        }
      }
    }
  }, 
  "mappings": {
    "properties": {
      "container": {
        "type": "nested" ,
        "properties": {
            "heading":    { "type": "text","boost": 2  },
            "para": { 
              "type": "text",
              "analyzer":"my_custom_analyzer"
            }
          }
      }
    }
  }
}

Below code is my indexing

PUT my_index/_doc/1
{
  "container" : [
    {
      "header" : "Limitation on incurrence of Indebtedness and issuance of Disqualified Stock and Preferred Stock",
      "para" :  "The will Issuer prince not, and will not permit any of its Restricted Subsidiaries to, Incur however, that the and any Restricted Subsidiary may the Consolidated Fixed Charge Coverage Ratio for the most recently ended four fiscal quart"
    },
    {
      "header" : "Selection and notice",
      "para" :  "If the  debt covenant is redeeming or purchasing less than all of the Secured Notes issued under th"
    }
  ]
}

Now I want to perform _analyze in _doc/1

Welcome! :wink:

Please format your code, logs or configuration files using </> icon as explained in this guide and not the citation button. It will make your post more readable.

Or use markdown style like:

```
CODE
```

This is the icon to use if you are not using markdown format:

There's a live preview panel for exactly this reasons.

Lots of people read these forums, and many of them will simply skip over a post that is difficult to read, because it's just too large an investment of their time to try and follow a wall of badly formatted text.
If your goal is to get an answer to your questions, it's in your interest to make it as easy to read and understand as possible.
Please update your post.

Now I want to perform _analyze in _doc/1

You can't really. But why do you want to do this?
You can analyze the text that you sent within this document though to understand what happened behind the scene with:

GET my_index/_analyze
{
  "field" : "container.para",
  "text" : [ 
     "Issuer not, and will not permit any of its Restricted Subsidiaries to, the Consolidated Fixed Charge Coverage Ratio for the most recently end Indebtedness is Incurred is at least 2.0 to 1.0", 
     "If the debt covenant is redeeming or purchasing less than all of the Secured Notes issued under the Secured Indenture at any time." 
  ]
}

My objective is to remove stopwords, do stemming from container.para and then write a query to find any match from para like below

image

Container contains alot of heading and para. I want to remove stopwords, perform stemming for all the text incontainer

Thanks in advance.

Please don't post images of text as they are hardly readable and not searchable.

Instead paste the text and format it with </> icon. Check the preview window.

You don't need to call _analyze API for this.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.