Stop words not indexed (custom edgeNGram analyzer with not STOP filter)


(Robin Boutros) #1

Hey,

That's how I define my mapping with Tire:

settings :analysis => {
:filter => {
:title_ngram => {
type: "edgeNGram",
side: "front",
max_gram: 15,
min_gram: 1
}
},
tokenizer: {
title_tokenizer: {
pattern: "[^\p{L}\d]+",
type: "pattern"
}
},
:analyzer => {
:ngram_analyzer => {
tokenizer: "title_tokenizer",
filter: ["lowercase", "title_ngram"],
type: "custom"
}
}
} do
mapping do
indexes :id, type: 'integer'
indexes :title, type: 'string', analyzer: 'snowball'
indexes :description, type: 'string', analyzer: 'snowball'
indexes :small_photo, index: :not_analyzed
indexes :ngram_title, :type => 'string', :index_analyzer =>
"ngram_analyzer", search_analyzer: "standard"

end
end

I have 2 documents: "The Tree" and "Isnogood".

When I search for "th", "The Tree" is found. With "The", it's not.
When I search for "i", "Isnogood" is found. With "is", it's not.

Why arent stop words indexed? There is no "stop" filter for the
ngram_analyzer...

Thanks!


(Shay Banon) #2

Your search analyzer is the standard analyzer, so it will run on your text
you provide as the query, and remove stopwords from it.

On Mon, May 7, 2012 at 11:04 PM, Robin Boutros niuage@gmail.com wrote:

Hey,

That's how I define my mapping with Tire:

settings :analysis => {
:filter => {
:title_ngram => {
type: "edgeNGram",
side: "front",
max_gram: 15,
min_gram: 1
}
},
tokenizer: {
title_tokenizer: {
pattern: "[^\p{L}\d]+",
type: "pattern"
}
},
:analyzer => {
:ngram_analyzer => {
tokenizer: "title_tokenizer",
filter: ["lowercase", "title_ngram"],
type: "custom"
}
}
} do
mapping do
indexes :id, type: 'integer'
indexes :title, type: 'string', analyzer: 'snowball'
indexes :description, type: 'string', analyzer: 'snowball'
indexes :small_photo, index: :not_analyzed
indexes :ngram_title, :type => 'string', :index_analyzer =>
"ngram_analyzer", search_analyzer: "standard"

end
end

I have 2 documents: "The Tree" and "Isnogood".

When I search for "th", "The Tree" is found. With "The", it's not.
When I search for "i", "Isnogood" is found. With "is", it's not.

Why arent stop words indexed? There is no "stop" filter for the
ngram_analyzer...

Thanks!


(system) #3