Ngram analyzer and term frequency


(Torben) #1

Hello,

I'm using a ngram tokenizer for full text search and got some questions about term frequency.

Full example: http://paste.ubuntu.com/14478646/

If I use the explain API on my search query (last 2 curl commands) for document 1 ("abcd - foooooooabcd") I got a term frequency of 2.0 for the string "abc", which is okay. But when I search for "abcd" I got a term frequency of 17.0. What? Shouldn't this also be 2.0?

The index analyzer works fine, so why this weird term frequency?
curl -XGET 'localhost:9200/test20160107/_analyze?analyzer=index_ngram_wd_analyzer&pretty' -d "abcd"

Thanks in advance for any help!

Best regards,
Torben


(system) #2