My stopwords filter is not working


(lyes zaiko) #1

Hi everybody,

can someone tell me if the following custom analyzer definition is correct
because the stopword filter seems to be not working?

"analysis" : {
"analyzer" : {
"default-analyzer" : {
"type" : "custom",
"tokenizer" : "standard",
"filter" : ["mystop", "mystemmer", "lowercase"]
}
},

        "filter" : {
            "mystemmer" : {
                "type" : "stemmer",
                "language" : "english"
            },

            "mystop" : {
                "type": "stop",
                "stopwords" : ["english"]
            }
        }

    }

When I look for the word "the" for example, I get results.

Thank you


(Igor Motov) #2

The stopwords parameter specifies the actual list of stop words. In your
case, only one word is treated as a stop word - english.

On Friday, April 27, 2012 9:31:29 AM UTC-4, lyes zaiko wrote:

Hi everybody,

can someone tell me if the following custom analyzer definition is
correct because the stopword filter seems to be not working?

"analysis" : {
"analyzer" : {
"default-analyzer" : {
"type" : "custom",
"tokenizer" : "standard",
"filter" : ["mystop", "mystemmer", "lowercase"]
}
},

        "filter" : {
            "mystemmer" : {
                "type" : "stemmer",
                "language" : "english"
            },
            
            "mystop" : {
                "type": "stop",
                "stopwords" : ["english"]
            }
        }
    
    }

When I look for the word "the" for example, I get results.

Thank you


(lyes zaiko) #3

Hi!
Ok I understand now, but if I want to use several stopwords filters, for
example "english" and "spanish" stopwords. How do I have to proceed?

On Sat, Apr 28, 2012 at 5:37 AM, Igor Motov imotov@gmail.com wrote:

The stopwords parameter specifies the actual list of stop words. In your
case, only one word is treated as a stop word - english.

On Friday, April 27, 2012 9:31:29 AM UTC-4, lyes zaiko wrote:

Hi everybody,

can someone tell me if the following custom analyzer definition is
correct because the stopword filter seems to be not working?

"analysis" : {
"analyzer" : {
"default-analyzer" : {
"type" : "custom",
"tokenizer" : "standard",
"filter" : ["mystop", "mystemmer", "lowercase"]
}
},

        "filter" : {
            "mystemmer" : {
                "type" : "stemmer",
                "language" : "english"
            },

            "mystop" : {
                "type": "stop",
                "stopwords" : ["english"]
            }
        }

    }

When I look for the word "the" for example, I get results.

Thank you


(Igor Motov) #4

The stopwords parameter uses lang notation:

"stopwords" : ["english", "spanish"]

On Saturday, April 28, 2012 3:56:52 AM UTC-4, lyes zaiko wrote:

Hi!
Ok I understand now, but if I want to use several stopwords filters, for
example "english" and "spanish" stopwords. How do I have to proceed?

On Sat, Apr 28, 2012 at 5:37 AM, Igor Motov imotov@gmail.com wrote:

The stopwords parameter specifies the actual list of stop words. In your
case, only one word is treated as a stop word - english.

On Friday, April 27, 2012 9:31:29 AM UTC-4, lyes zaiko wrote:

Hi everybody,

can someone tell me if the following custom analyzer definition is
correct because the stopword filter seems to be not working?

"analysis" : {
"analyzer" : {
"default-analyzer" : {
"type" : "custom",
"tokenizer" : "standard",
"filter" : ["mystop", "mystemmer", "lowercase"]
}
},

        "filter" : {
            "mystemmer" : {
                "type" : "stemmer",
                "language" : "english"
            },
            
            "mystop" : {
                "type": "stop",
                "stopwords" : ["english"]
            }
        }
    
    }

When I look for the word "the" for example, I get results.

Thank you


(lyes zaiko) #5

Thank you for your indication !

On Sat, Apr 28, 2012 at 3:48 PM, Igor Motov imotov@gmail.com wrote:

The stopwords parameter uses lang notation:

"stopwords" : ["english", "spanish"]

On Saturday, April 28, 2012 3:56:52 AM UTC-4, lyes zaiko wrote:

Hi!
Ok I understand now, but if I want to use several stopwords filters, for
example "english" and "spanish" stopwords. How do I have to proceed?

On Sat, Apr 28, 2012 at 5:37 AM, Igor Motov imotov@gmail.com wrote:

The stopwords parameter specifies the actual list of stop words. In your
case, only one word is treated as a stop word - english.

On Friday, April 27, 2012 9:31:29 AM UTC-4, lyes zaiko wrote:

Hi everybody,

can someone tell me if the following custom analyzer definition is
correct because the stopword filter seems to be not working?

"analysis" : {
"analyzer" : {
"default-analyzer" : {
"type" : "custom",
"tokenizer" : "standard",
"filter" : ["mystop", "mystemmer", "lowercase"]
}
},

        "filter" : {
            "mystemmer" : {
                "type" : "stemmer",
                "language" : "english"
            },

            "mystop" : {
                "type": "stop",
                "stopwords" : ["english"]
            }
        }

    }

When I look for the word "the" for example, I get results.

Thank you


(system) #6