Ngram indexing and search results quality


(Nadav Samet) #1

Hi,

I am indexing using an nGram filter and it seems to be working - I am able
to find substrings of words. However, I have noticed that documents that
contain only a substring of the term I am looking for are ranked above
documents who have an exact match. For instance, if I search for "rain", I
can get a document that contains the word "brainstorm" above a document
that contains the exact term "rain".

Is there a way to have exact matches score more? Similarly, is there a way
to boost the score of ngrams that contain the first letter?

Thanks,
Nadav

For reference, here are the settings I put on the index:

{
"settings": {
"analysis": {
"filter": {
"nGram_filter": {
"type": "nGram",
"min_gram": 2,
"max_gram": 20,
"token_chars": [
"letter",
"digit",
"punctuation",
"symbol"
]
}
},
"analyzer": {
"nGram_analyzer": {
"type": "custom",
"tokenizer": "whitespace",
"filter": [
"lowercase",
"asciifolding",
"nGram_filter"
]
},
"whitespace_analyzer": {
"type": "custom",
"tokenizer": "whitespace",
"filter": [
"lowercase",
"asciifolding"
]
}
}
}
},
"mappings": {
"person": {
"_all": {
"index_analyzer": "nGram_analyzer",
"search_analyzer": "whitespace_analyzer"
},
"properties": { .. }
}
}
}

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/7f4e0609-de8d-4e72-8608-3d53f4ade40c%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(system) #2