I have added a custom analyzer to my index settings, but not getting results

 // This the setting of my index 

        {
          "hw_transcriptionlogindex" : {
            "settings" : {
              "index" : {
                "number_of_shards" : "1",
                "provided_name" : "hw_transcriptionlogindex",
                "creation_date" : "1596547726213",
                "analysis" : {
                  "analyzer" : {
                    "my_stop_analyzer" : {
                      "type" : "stop",
                      "stopwords" : [
                        "the",
                        "over",
                        "gotta"
                      ]
                    }
                  }
                },
                "number_of_replicas" : "1",
                "uuid" : "eIRUaSflQW-i2ajJ-pQw6A",
                "version" : {
                  "created" : "7060299"
                }
              }
            }
          }
        }


// this is the searching request
GET hw_transcriptionlogindex/_search?size=10000
{
  "query" : {
		            	"bool" : {
		                	"must" : [
		                    	{ "match" : { "word" : "george" }},
		                	    { "range" : {
		            	    	"call_start_time" : {
		            	    		"gte" : "2020-08-09 00:00:00",
		            	    		"lte" : "2020-08-10 23:59:00"
		            	    	}
		            	    	}}
		            	    ],
		            	    
		            	    "filter" : {
		        	      	"terms" : {
		        	    		"site_id" : ["18","90","113","121","148","154","222","223","227"]
		        	      	}
		            	    }
		            	}

  }
}


/* Have added the setting to already created index by opening and closing the index */

Still not able to receive the appropriate results, unable to recognize the issue.

please take your time to write a proper problem statement first.

  • What are you trying to achieve?
  • What is not working as expected?
  • Where do the results differ?
  • Provide a complete reproduction case, so others can follow your issue. That includes full requests for index creation, index mapping, document indexing and the queries you are executing.

See https://www.elastic.co/help

All of these will result in way more people reading your post and thus increase the likelihood of getting help.

Thank you!

@spinscale, Really sorry for the inconvenience,

1. What I am trying to achieve is

I am trying to fetch the records which can remove the stop words that I have defined in my stop analyzer.

Following is the mapping of my index :

PUT /hw_transcriptionlogindex
{
  "mappings": {
      "properties": {
        "id": { "type" : "long" },  
        "channel_id": { "type" : "integer" },
        "score": { "type" : "double" }, 
        "confidence": { "type" : "keyword" },
        "start": { "type" : "keyword" },
        "end": { "type" : "keyword" },
        "word": {"type": "text",
                      "fields": {
                          "keyword": { 
                            "type": "keyword"
                          }
                        }
        },
        "status": { "type" : "integer" },
        "result": { "type" : "keyword",
                      "null_value": "null"},
        "ex_id": { "type" : "keyword" },
        "start_time": { 
          "type" : "date", 
          "format" : "yyyy-MM-dd HH:mm:ss||yyyy-MM-dd||epoch_millis"
        },
        "duration" : { "type" : "keyword" },
        "site_id": { "type": "keyword"},
        "updated_at": { 
          "type" : "date", 
          "format" : "yyyy-MM-dd HH:mm:ss||yyyy-MM-dd||epoch_millis" 
        },
        "created_on": { 
          "type" : "date", 
          "format" : "yyyy-MM-dd HH:mm:ss||yyyy-MM-dd||epoch_millis" 
        }
      }
  }
}

Following is the settings of my index(my stop analyzer)

{
  "hw_transcriptionlogindex" : {
    "settings" : {
      "index" : {
        "number_of_shards" : "1",
        "provided_name" : "hw_transcriptionlogindex",
        "creation_date" : "1596547726213",
        "analysis" : {
          "analyzer" : {
            "my_stop_analyzer" : {
              "type" : "stop",
              "stopwords" : [
                "the",
                "over",
                "gotta",
                "is"
              ]
            }
          }
        },
        "number_of_replicas" : "1",
        "uuid" : "eIRUaSflQW-i2ajJ-pQw6A",
        "version" : {
          "created" : "7060299"
        }
      }
    }
  }

my focus is the word field in the above index.

The analyzer is not working as expected I am still getting the documents(word field) having the stop words which I am trying to filter. for eg:

{
       {
        "_index" : "hw_transcriptionlogindex",
        "_type" : "_doc",
        "_id" : "8j8s1XMBRHi1McaTreTu",
        "_score" : 12.727324,
        "_source" : {
          "id" : "1204047",
          "channel_id" : 0,
          "score" : 0,
          "confidence" : 1,
          "start" : 7100050000,
          "end" : 7114750000,
          "word" : "george gotta go ",
          "ex_id" : "124713740",
          "start_time" : "2020-08-09 08:39:51",
          "duration" : "805",
          "site_id" : "80060",
          "created_on" : "2020-08-09 21:42:21"
        }
},
{
    {
        "_index" : "hw_transcriptionlogindex",
        "_type" : "_doc",
        "_id" : "BVJo33MBRHi1McaTdDFc",
        "_score" : 11.414179,
        "_source" : {
          "id" : "1260123",
          "channel_id" : 1,
          "score" : 0,
          "confidence" : 1,
          "start" : 387550000,
          "end" : 398950000,
          "word" : "george is what to take ",
          "ex_id" : "154900766",
          "start_time" : "2020-08-10 19:02:01",
          "duration" : "480",
          "site_id" : "181",
          "created_on" : "2020-08-11 21:23:51"
        }
}

As we can see in above two records, the word field is not filtering the "gotta", "is" stop words.

Regards

I do not see any field in your mapping configured to use your custom analyser. Analysers do not alter the source document, only the terms that are indexed.

This is my search query:

GET hw_transcriptionlogindex/_search?size=1000
{
  "query" : {
		            	"bool" : {
		                	"must" : [
		                    	{ "match" : 
		                    	    { "word" : 
		                    	        {
		                    	          "query":"george",
		                    	          "analyzer": "my_stop_analyzer"
		                    	        }
		                    	    }
		                    	},
		                	    { "range" : {
		            	    	    "call_start_time" : {
		            	    		"gte" : "2020-08-09 00:00:00",
		            	    		"lte" : "2020-08-10 23:59:00"
		            	    	}
		            	    	}}
		            	    ],
		            	    
		            	    "filter" : {
		        	      	"terms" : {
		        	    		"site_id" : ["181","334543","80060","6586"]
		        	      	}
		            	    }
		            	}

  }
}

The analyser processes the field before querying ( there are no stop words in your query string) And the full matching document is then returned as expected. As that document contains stop words they are returned.

Can you please help me with the approach that i can apply to my search query, I am stucked in this issue.

It seems to be working correctly. You will always get the source document you indexed returned in unaltered form. Can you explain what it is you are expecting or looking to achieve?

I am expecting the altered form of results, like in my above explanation, I included three stopwords , but these words are not getting removed at all. How can I get the data from

to

"word" : ["george", "go "],

Analysers do not alter the content of the documents which is what is returned so what you are looking for is not possible unless you remove the stopwords before indexing the document.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.