Way to store extracted list of terms in elasticsearch (array, text, ...)


#1

Hello!

I have a list of already extracted terms and I don't want to analyze them again (so I mark field "not_analyzed").
Next I'm trying to store them as array of strings. Here is an example:

// create index 

curl -XPUT 'http://localhost:9200/test/' -d '{                               
     "settings" : {
        "number_of_shards" : 1,
        "number_of_replicas" : 0
     },  
     "mappings": {
        "doc": {
            "properties": {
                "terms": {
                    "type": "string",
                    "index" : "not_analyzed", 
                    "term_vector" : "yes"   
                }
            }
        }
    }
}'

// some data

curl -XPUT 'http://localhost:9200/test/doc/1' -d '{
    "terms" : ["quick", "brown", "fox"]
}'

curl -XPUT 'http://localhost:9200/test/doc/2' -d '{
    "terms" : ["the", "red", "fox", "fox"]
}'

1) Term_vectors
All terms have correct term_freq and doc_freq values, but ttf == -1.

curl -XGET 'http://localhost:9200/test/doc/1/_termvectors?pretty=true' -d'{
    "term_statistics" : true,
    "field_statistics" : true
}'

2) Finding similar documents for document with _id = 1.
Explain query show that term "fox" has tf == 1.0, but I expected 2.0.

curl -XPOST 'http://localhost:9200/test/doc/_search?explain' -d '{
    "query": {
        "more_like_this": {
            "fields": ["terms"],
            "like": [{"_id" : "1"}],
            "min_term_freq": 1,
            "min_doc_freq": 1
        }
    }
}'

I know what I can concatenate terms into string and store it, use elastic's analyzer (ex
Whitespace Analyzer) to build terms.

But I wonder whether it is possible to store terms in array of string and have correct "tf" in explain query and "ttf" value for each terms in vector?

*If I mark field "terms" as "analyzed" all working fine => ttf and tf calculated correctly.


(system) #2