Hello!
I have a list of already extracted terms and I don't want to analyze them again (so I mark field "not_analyzed").
Next I'm trying to store them as array of strings. Here is an example:
// create index
curl -XPUT 'http://localhost:9200/test/' -d '{
"settings" : {
"number_of_shards" : 1,
"number_of_replicas" : 0
},
"mappings": {
"doc": {
"properties": {
"terms": {
"type": "string",
"index" : "not_analyzed",
"term_vector" : "yes"
}
}
}
}
}'
// some data
curl -XPUT 'http://localhost:9200/test/doc/1' -d '{
"terms" : ["quick", "brown", "fox"]
}'
curl -XPUT 'http://localhost:9200/test/doc/2' -d '{
"terms" : ["the", "red", "fox", "fox"]
}'
1) Term_vectors
All terms have correct term_freq and doc_freq values, but ttf == -1.
curl -XGET 'http://localhost:9200/test/doc/1/_termvectors?pretty=true' -d'{
"term_statistics" : true,
"field_statistics" : true
}'
2) Finding similar documents for document with _id = 1.
Explain query show that term "fox" has tf == 1.0, but I expected 2.0.
curl -XPOST 'http://localhost:9200/test/doc/_search?explain' -d '{
"query": {
"more_like_this": {
"fields": ["terms"],
"like": [{"_id" : "1"}],
"min_term_freq": 1,
"min_doc_freq": 1
}
}
}'
I know what I can concatenate terms into string and store it, use elastic's analyzer (ex
Whitespace Analyzer) to build terms.
But I wonder whether it is possible to store terms in array of string and have correct "tf" in explain query and "ttf" value for each terms in vector?
*If I mark field "terms" as "analyzed" all working fine => ttf and tf calculated correctly.