Hi Niranjan
Sorry about that. I am sending gist of some of the curl operations.
Hope this will shed more light on the issue I am facing.
Much better!
- Firstly, here is the gist of the mapping I am using:
gist:3373031 · GitHub
As you can see, my index type "interests" has ngram as my
index_analyzer and "standard" as search_analyzer. I am also adding
porter stem filter to "my_ngram" analyzer to be able to search over
stem words
OK - there are several things wrong with this mapping:
-
you are using a blunderbuss approach - trying to make EVERYTHING
autocomplete and stemmed and and. Rather enable these things
selectively, where you really need it
-
Rather use edge-ngrams instead of ngrams. People expect
'tre' to match 'trekking', not 'contretemps'.
-
There is no need to combine porter-stem with ngrams.
Porter-stem may convert (eg) 'camping' to 'camp'.
Edge-ngrams would give you:
c, ca, cam, camp, campi, campin, camping
Combined with stemming, you'd just get:
c, ca, cam, camp
I think that's where your entrpreneurs search is going wrong
-
Use the same analyzer at search and index time, otherwise there
is a good chance that you'll be searching for stuff which just
isn't there! Also, different analyzers order tokens differently,
which can affect results.
A better mapping is gist:992e0e704e035e8a1770 · GitHub
Note: I'm using a multi-field for "name" with two versions: one analyzed
with the 'english' analyzer, and one with edge_ngrams.
The basic query looks like this:
curl -XGET 'http://127.0.0.1:9200/test/interests/_search?pretty=1' -d '
{
"query" : {
"bool" : {
"should" : [
{
"text" : {
"name.ngrams" : {
"operator" : "and",
"query" : "SEARCH TERMS"
}
}
},
{
"text" : {
"name" : "SEARCH TERMS"
}
}
]
}
},
"highlight" : {
"fields" : {
"name.ngrams" : {},
"name" : {}
}
}
}
'
I'm querying the 'name.ngrams' field (which will get the partial words')
and I'm also querying the 'name' (or 'name.name') field which will match
any full words, and increase their relevance.
Note: i'm using the 'and' operator for the ngrams, otherwise 'tre' will
match anything that contains just 't'.
I'm highlighting both fields.
Here are some results:
For 'tre':
"name.ngrams" : "trekking"
For 'trek':
"name.ngrams" : "<em>trek</em>king"
"name" : "<em>trekking</em>"
# note 'trekking' has been highlighted because the stemmed term
# is 'trek'
For 'bas':
"name.ngrams" : "play basketball"
For 'basketbal':
"name.ngrams" : "play <em>basketbal</em>l"
"name" : "play <em>basketball</em>"
For 'entrepreneurs';
"name.ngrams" : "meet <em>entreprene</em>urs"
"name" : "meet <em>entrepreneurs</em>"
# note how the ngram match stops at 10 letters - that's because
# your max_gram was set to 10.
hth
clint
--