Actually, it's very strange for me to encounter this problem:
Index name: surfikisemterms
Index mapping:
{"surfikisemterms":{"jdbc_search":{"properties":{"facets":{"dynamic":"true","properties":{"termscount":{"dynamic":"true","properties":{"terms":{"dynamic":"true","properties":{"field":{"type":"string"},"size":{"type":"long"}}}}}}},"from":{"type":"long"},"query":{"dynamic":"true","properties":{"bool":{"dynamic":"true","properties":{"must":{"dynamic":"true","properties":{"query_string":{"dynamic":"true","properties":{"query":{"type":"string"}}}}}}}}},"size":{"type":"long"}}},"jdbc":{"properties":{"keywords":{"type":"string","analyzer":"comma"},"time":{"type":"date","format":"dateOptionalTime"}}}}}
Index setting:
{"surfikisemterms":{"settings":{"index.analysis.tokenizer.commatokenizer.type":"pattern","index.analysis.analyzer.comma.type":"custom","index.number_of_replicas":"1","index.version.created":"200599","index.analysis.tokenizer.commatokenizer.pattern":",","index.analysis.analyzer.comma.tokenizer":"commatokenizer","index.number_of_shards":"5"}}}
So actually we only have 2 fields for the index, keywords which is a word list and time is the timestamp.
I have 3 nodes in cluster: with all environment variables:
ES_HEAP_SIZE=3000M
ES_MIN_MEM=3000M
ES_MAX_MEM=3000M
Then, if a do a query like:
{
"query": {
"bool": {
"must": [
{
"range": {
"time": {
"from": "2013-03-07T11:29:19.000Z",
"to": "2013-03-07T11:30:19.000Z"
}
}
}
],
"must_not": [],
"should": []
}
},
"from": 0,
"size": 10,
"sort": [],
"facets": {}
}
Totally I shall get 21 hits, and it returns so fast. among these hits, there is no one hit could contain more than 40 keywords..
Then if I do:
{
"query": {
"bool": {
"must": [
{
"range": {
"time": {
"from": "2013-03-07T11:29:19.000Z",
"to": "2013-03-07T11:30:19.000Z"
}
}
}
],
"must_not": [],
"should": []
}
},
"from": 0,
"size": 22,
"sort": [],
"facets": {
"termscount": {
"terms": {
"field": "keywords",
"size": 25
}
}
}
}
I can see following logs repeating for several time:
[2013-03-08 03:31:05,571][WARN ][index.cache.field.data.resident] [Surfiki Master: Bobby Hutcherson] [surfikisemterms] loading field [keywords] caused out of memory failure
java.lang.OutOfMemoryError: Java heap space..
What I got confused is for above query, the hit docs only contain no more than 500 keywords, why do such terms facet shall make out of memory error?
Thanks.