Best way to disable only one stopword in Elastic search


(Ram Sunka) #1

HI Team, As we know 'AN' is stopword in Elastic search. I have executed the below query.

{
“from”: 0,
“size”: 100,
“query”: {
“bool”: {
“must”: [
{
“match”: {
“receiver”: {
“query”: “AN”,
“type”: “boolean”
}
}
}
]
}
}
}

The query returns an empty results even though the document is available in ES

ES has the below document
{
_index: messages
_type: ABCType
_id: 4571b1a9-d0eb-4e98-bc3f-562d2ee8e206
_version: 1
_score: 1
_source: {
sender_name: Unknown
receiver: AN
}
}

here, What could be the best way to disable the stopword 'AN' only. Which analyzer i should use? Were to configure the analyzer?


(tri-man) #2

Try this link
https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-stop-tokenfilter.html


(Ram Sunka) #3

Thank you so much. Here I just need to disable only one stopword. So I will follow the above link and can create the analyzer.But, after adding the analyzer to index, does it really affects ES performance?


(Adrien Grand) #4

There is no way to remove a stopword, you need to redefine a standard analyzer and provide the whole list of stop words that you need. https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-standard-analyzer.html#_configuration

Here is the current list: https://github.com/apache/lucene-solr/blob/e2521b2a8baabdaf43b92192588f51e042d21e97/lucene/core/src/java/org/apache/lucene/analysis/standard/StandardAnalyzer.java#L45


(tri-man) #5

the idea here is to use your own "stop words" list which does not include the ones you don't want, instead of using the standard "stop words" list. Once you define your own, you need to define your own "analyzer" and apply or use it where it is appropriate.


(Ram Sunka) #6

Thank you so much, guys.
The below code snippet am going to use, is that fine? Also, What about the performance issues?

index :
analysis :
analyzer :
default :
type : standard
stopwords : [ "a", "and", "are", "as", "at", "be", "but", "by",
"for", "if", "in", "into", "is", "it",
"no", "not", "of", "on", "or", "such",
"that", "the", "their", "then", "there", "these",
"they", "this", "to", "was", "will", "with"]

Note: I am using ES version 1.3.4 only.


(tri-man) #7

the performance should not be worse than a longer stop words list (if I have to guess)


(Ram Sunka) #8

Thank you so much, @thn .

Also, Can any one from elastic team confirms that the above one is correct?


(system) #9

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.