Stop words , Keep Filter not working


(Bijuv V) #1

I have country codes in an index and there are country codes like AT, BE,
NO which are also in the English Stop words. For this reason, my search for
these countries are not working.

I need to have this query working (Search records with AT mentioned in any
of the files - AT=Austria)

POST myindexes/_search
{
"query": {
"match": {
"_all": "AT"
}
}
}

By default, this is not returning any results. So I turned towards
analyzers. In the below settings, I have 3 analyzers with three different
type of filters. Despite the three options, its not working.

Note, I have added 3 different analyzers for testing purpose only.

The mappings and settings are given below . Also sample records created are
also given below

PUT myindexes
{
"settings": {
"analysis": {
"analyzer": {
"myanalyzerwithstop":{
"tokenizer" : "standard",
"filter" : ["standard", "lowercase", "stop"]
},
"myanalyzernostop":{
"tokenizer" : "standard",
"filter" : ["standard", "lowercase"]
},
"myanalyzerstopwithkeep":{
"tokenizer" : "standard",
"filter" : ["standard", "lowercase","mywords"]
}
},
"filter": {
"mywords" :
{
"type" : "keep",
"keep_words" : [ "AT", "NO", "BE","GB"]
}
}
}

},
"mappings": {
"myindex":{
"_all" : {
"enabled" : true,
"analyzer": "myanalyzerstopwithkeep"

     },
    "properties": {
        "countrywithstop" : {
            "type": "string",
            "index": "analyzed", 
           "analyzer": "myanalyzerwithstop"
        },
        "countrywithnostop" : {
            "type": "string",
            "index": "analyzed", 
           "analyzer": "myanalyzernostop"
        },
         "countrywithkeep" : {
            "type": "string",
            "index": "analyzed", 
           "analyzer": "myanalyzerstopwithkeep"
        }
    }
}

}
}

The object creation script -

PUT myindexes/myindex/1
{
"countrywithstop" :"AT",
"countrywithnostop" :"AT",
"countrywithkeep" :"AT"
}

PUT myindexes/myindex/2
{
"countrywithstop" :"GB",
"countrywithnostop" :"GB",
"countrywithkeep" :"GB"
}

PUT myindexes/myindex/3
{
"countrywithstop" :"NO",
"countrywithnostop" :"NO",
"countrywithkeep" :"NO"
}

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/fabd4576-09db-493d-b64a-dff8eba2214b%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Binh Ly) #2

Bijuv,

If you are still searching on the _all field, try to remove "lowercase"
from the keep analyzer:

        "myanalyzerstopwithkeep":{
            "tokenizer" : "standard",
            "filter" : ["standard", "mywords"]
        }

What's happening is the lowercase filter makes "AT" into "at" and therefore
your mywords filter will not have any effect since "at" is not the same as
"AT".

Or you can set the _all analyzer to "myanalyzernostop" and it will work
correctly also.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/499c2459-123c-4d79-9dbf-fe60a16e7f00%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Bijuv V) #3

Hi Binh

If the filter has only "filter" : ["standard", "mywords"], then by default
AT or at will appear in the search result as the STOP filter is removed

What Im trying to achieve is :

Remove all the default english stop words (use STOP filter)

Retain the words that I mention under keep words . For e.g. if I mention AT
or BE in the keep filter, it should remove all stop words except AT and BE.

How is this achieved.

Regards
Viju

On Mon, Jan 27, 2014 at 5:13 PM, Binh Ly binh@hibalo.com wrote:

Bijuv,

If you are still searching on the _all field, try to remove "lowercase"
from the keep analyzer:

        "myanalyzerstopwithkeep":{
            "tokenizer" : "standard",
            "filter" : ["standard", "mywords"]
        }

What's happening is the lowercase filter makes "AT" into "at" and
therefore your mywords filter will not have any effect since "at" is not
the same as "AT".

Or you can set the _all analyzer to "myanalyzernostop" and it will work
correctly also.

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/-LrpMWJME6c/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/499c2459-123c-4d79-9dbf-fe60a16e7f00%40googlegroups.com
.

For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CALU-2jxkHHiupVPLATH_WJEX%3DK8m83kh%3DKBRM0j_27SR_0wh-A%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.


(system) #4