Out Of Memory during the terms facets

Actually, it's very strange for me to encounter this problem:

Index name: surfikisemterms

Index mapping:
{"surfikisemterms":{"jdbc_search":{"properties":{"facets":{"dynamic":"true","properties":{"termscount":{"dynamic":"true","properties":{"terms":{"dynamic":"true","properties":{"field":{"type":"string"},"size":{"type":"long"}}}}}}},"from":{"type":"long"},"query":{"dynamic":"true","properties":{"bool":{"dynamic":"true","properties":{"must":{"dynamic":"true","properties":{"query_string":{"dynamic":"true","properties":{"query":{"type":"string"}}}}}}}}},"size":{"type":"long"}}},"jdbc":{"properties":{"keywords":{"type":"string","analyzer":"comma"},"time":{"type":"date","format":"dateOptionalTime"}}}}}

Index setting:
{"surfikisemterms":{"settings":{"index.analysis.tokenizer.commatokenizer.type":"pattern","index.analysis.analyzer.comma.type":"custom","index.number_of_replicas":"1","index.version.created":"200599","index.analysis.tokenizer.commatokenizer.pattern":",","index.analysis.analyzer.comma.tokenizer":"commatokenizer","index.number_of_shards":"5"}}}

So actually we only have 2 fields for the index, keywords which is a word list and time is the timestamp.

I have 3 nodes in cluster: with all environment variables:
ES_HEAP_SIZE=3000M
ES_MIN_MEM=3000M
ES_MAX_MEM=3000M

Then, if a do a query like:
{
"query": {
"bool": {
"must": [
{
"range": {
"time": {
"from": "2013-03-07T11:29:19.000Z",
"to": "2013-03-07T11:30:19.000Z"
}
}
}
],
"must_not": [],
"should": []
}
},
"from": 0,
"size": 10,
"sort": [],
"facets": {}
}

Totally I shall get 21 hits, and it returns so fast. among these hits, there is no one hit could contain more than 40 keywords..

Then if I do:
{
"query": {
"bool": {
"must": [
{
"range": {
"time": {
"from": "2013-03-07T11:29:19.000Z",
"to": "2013-03-07T11:30:19.000Z"
}
}
}
],
"must_not": [],
"should": []
}
},
"from": 0,
"size": 22,
"sort": [],
"facets": {
"termscount": {
"terms": {
"field": "keywords",
"size": 25
}
}
}
}
I can see following logs repeating for several time:

[2013-03-08 03:31:05,571][WARN ][index.cache.field.data.resident] [Surfiki Master: Bobby Hutcherson] [surfikisemterms] loading field [keywords] caused out of memory failure
java.lang.OutOfMemoryError: Java heap space..

What I got confused is for above query, the hit docs only contain no more than 500 keywords, why do such terms facet shall make out of memory error?

Thanks.

Hi,

Could you create a full curl recreation and GIST it?
On which ES version are you working?
Which java version?

--
David :wink:
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Le 8 mars 2013 à 04:33, Curt Hu zhongting.hu@gmail.com a écrit :

Actually, it's very strange for me to encounter this problem:

Index name: surfikisemterms

Index mapping:
{"surfikisemterms":{"jdbc_search":{"properties":{"facets":{"dynamic":"true","properties":{"termscount":{"dynamic":"true","properties":{"terms":{"dynamic":"true","properties":{"field":{"type":"string"},"size":{"type":"long"}}}}}}},"from":{"type":"long"},"query":{"dynamic":"true","properties":{"bool":{"dynamic":"true","properties":{"must":{"dynamic":"true","properties":{"query_string":{"dynamic":"true","properties":{"query":{"type":"string"}}}}}}}}},"size":{"type":"long"}}},"jdbc":{"properties":{"keywords":{"type":"string","analyzer":"comma"},"time":{"type":"date","format":"dateOptionalTime"}}}}}

Index setting:
{"surfikisemterms":{"settings":{"index.analysis.tokenizer.commatokenizer.type":"pattern","index.analysis.analyzer.comma.type":"custom","index.number_of_replicas":"1","index.version.created":"200599","index.analysis.tokenizer.commatokenizer.pattern":",","index.analysis.analyzer.comma.tokenizer":"commatokenizer","index.number_of_shards":"5"}}}

So actually we only have 2 fields for the index, keywords which is a word
list and time is the timestamp.

I have 3 nodes in cluster: with all environment variables:
ES_HEAP_SIZE=3000M
ES_MIN_MEM=3000M
ES_MAX_MEM=3000M

Then, if a do a query like:
{
"query": {
"bool": {
"must": [
{
"range": {
"time": {
"from": "2013-03-07T11:29:19.000Z",
"to": "2013-03-07T11:30:19.000Z"
}
}
}
],
"must_not": [],
"should": []
}
},
"from": 0,
"size": 10,
"sort": [],
"facets": {}
}

Totally I shall get 21 hits, and it returns so fast. among these hits, there
is no one hit could contain more than 40 keywords..

Then if I do:
{
"query": {
"bool": {
"must": [
{
"range": {
"time": {
"from": "2013-03-07T11:29:19.000Z",
"to": "2013-03-07T11:30:19.000Z"
}
}
}
],
"must_not": [],
"should": []
}
},
"from": 0,
"size": 22,
"sort": [],
"facets": {
"termscount": {
"terms": {
"field": "keywords",
"size": 25
}
}
}
}
I can see following logs repeating for several time:

[2013-03-08 03:31:05,571][WARN ][index.cache.field.data.resident] [Surfiki
Master: Bobby Hutcherson] [surfikisemterms] loading field [keywords] caused
out of memory failure
java.lang.OutOfMemoryError: Java heap space..

What I got confused is for above query, the hit docs only contain no more
than 500 keywords, why do such terms facet shall make out of memory error?

Thanks.

--
View this message in context: http://elasticsearch-users.115913.n3.nabble.com/Out-Of-Memory-during-the-terms-facets-tp4031285.html
Sent from the ElasticSearch Users mailing list archive at Nabble.com.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Hi,

The following thread might help you.
https://groups.google.com/forum/#!topic/elasticsearch/4Uxbmy-e1ao

-- Sujoy.

On Friday, March 8, 2013 9:03:16 AM UTC+5:30, Curt wrote:

Actually, it's very strange for me to encounter this problem:

Index name: surfikisemterms

Index mapping:
{"surfikisemterms":{"jdbc_search":{"properties":{"facets":{"dynamic":"true","properties":{"termscount":{"dynamic":"true","properties":{"terms":{"dynamic":"true","properties":{"field":{"type":"string"},"size":{"type":"long"}}}}}}},"from":{"type":"long"},"query":{"dynamic":"true","properties":{"bool":{"dynamic":"true","properties":{"must":{"dynamic":"true","properties":{"query_string":{"dynamic":"true","properties":{"query":{"type":"string"}}}}}}}}},"size":{"type":"long"}}},"jdbc":{"properties":{"keywords":{"type":"string","analyzer":"comma"},"time":{"type":"date","format":"dateOptionalTime"}}}}}

Index setting:
{"surfikisemterms":{"settings":{"index.analysis.tokenizer.commatokenizer.type":"pattern","index.analysis.analyzer.comma.type":"custom","index.number_of_replicas":"1","index.version.created":"200599","index.analysis.tokenizer.commatokenizer.pattern":",","index.analysis.analyzer.comma.tokenizer":"commatokenizer","index.number_of_shards":"5"}}}

So actually we only have 2 fields for the index, keywords which is a word
list and time is the timestamp.

I have 3 nodes in cluster: with all environment variables:
ES_HEAP_SIZE=3000M
ES_MIN_MEM=3000M
ES_MAX_MEM=3000M

Then, if a do a query like:
{
"query": {
"bool": {
"must": [
{
"range": {
"time": {
"from": "2013-03-07T11:29:19.000Z",
"to": "2013-03-07T11:30:19.000Z"
}
}
}
],
"must_not": [],
"should": []
}
},
"from": 0,
"size": 10,
"sort": [],
"facets": {}
}

Totally I shall get 21 hits, and it returns so fast. among these hits,
there
is no one hit could contain more than 40 keywords..

Then if I do:
{
"query": {
"bool": {
"must": [
{
"range": {
"time": {
"from": "2013-03-07T11:29:19.000Z",
"to": "2013-03-07T11:30:19.000Z"
}
}
}
],
"must_not": [],
"should": []
}
},
"from": 0,
"size": 22,
"sort": [],
"facets": {
"termscount": {
"terms": {
"field": "keywords",
"size": 25
}
}
}
}
I can see following logs repeating for several time:

[2013-03-08 03:31:05,571][WARN ][index.cache.field.data.resident] [Surfiki
Master: Bobby Hutcherson] [surfikisemterms] loading field [keywords]
caused
out of memory failure
java.lang.OutOfMemoryError: Java heap space..

What I got confused is for above query, the hit docs only contain no more
than 500 keywords, why do such terms facet shall make out of memory error?

Thanks.

--
View this message in context:
http://elasticsearch-users.115913.n3.nabble.com/Out-Of-Memory-during-the-terms-facets-tp4031285.html
Sent from the ElasticSearch Users mailing list archive at Nabble.com.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Hi, update gist here:

Please tell me if I have missed anything. Great thanks for help.

Hello , guys.

Is any update for this issue?
Thanks.

Hi,

You need more memory to let Elasticsearch load all different values in cache.
You should try to run it with 0.90.0.Beta1 as the memory footprint has been reduced.

You can also redesign your index for example by dates. It will reduce the number of terms to load in memory.
It depends here on your use case. Sounds like your filter applies on a single day. So, an index per day?

--
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet | @elasticsearchfr | @scrutmydocs

Le 8 mars 2013 à 11:31, Curt Hu zhongting.hu@gmail.com a écrit :

Hi, update gist here:
https://gist.github.com/BlueStalker/5115578

Please tell me if I have missed anything. Great thanks for help.

--
View this message in context: http://elasticsearch-users.115913.n3.nabble.com/Out-Of-Memory-during-the-terms-facets-tp4031285p4031305.html
Sent from the ElasticSearch Users mailing list archive at Nabble.com.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Hi,

Actually, how ElasticSearch implements the terms facets. You were talking about loading all different values in cache, what 's the mechanism..

In my real example, the query will filter the index within several minutes, which cause only about 20 docs in total hits, and the total terms are no more than 500.. It should be a limited number to do any type of sort..

Thanks.

Hi,

You need more memory to let Elasticsearch load all different values in cache.
You should try to run it with 0.90.0.Beta1 as the memory footprint has been reduced.

You can also redesign your index for example by dates. It will reduce the number of terms to load in memory.
It depends here on your use case. Sounds like your filter applies on a single day. So, an index per day?

--
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet | @elasticsearchfr | @scrutmydocs

Le 8 mars 2013 à 11:31, Curt Hu <zhongting.hu@> a écrit :

Hi, update gist here:
https://gist.github.com/BlueStalker/5115578

Please tell me if I have missed anything. Great thanks for help.

--
View this message in context: http://elasticsearch-users.115913.n3.nabble.com/Out-Of-Memory-during-the-terms-facets-tp4031285p4031305.html
Sent from the ElasticSearch Users mailing list archive at Nabble.com.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@.
For more options, visit https://groups.google.com/groups/opt_out.

On Tue, 2013-03-12 at 19:54 -0700, Curt Hu wrote:

Hi,

Actually, how ElasticSearch implements the terms facets. You were talking
about loading all different values in cache, what 's the mechanism..

In my real example, the query will filter the index within several minutes,
which cause only about 20 docs in total hits, and the total terms are no
more than 500.. It should be a limited number to do any type of sort..

If you need the field values for docs 1,2 and 3 for this query, you're
probably going to need the values for doc 4,5 and 6 at some stage in the
future.

So the most efficient thing to do is to load the field values for all
docs into memory in one go. Then they're available for future requests
without having to reload.

So unfortunately, even if your query matches just a few docs, you need
space on your heap for the values from all docs.

clint

Thanks.

dadoonet wrote

Hi,

You need more memory to let Elasticsearch load all different values in
cache.
You should try to run it with 0.90.0.Beta1 as the memory footprint has
been reduced.

You can also redesign your index for example by dates. It will reduce the
number of terms to load in memory.
It depends here on your use case. Sounds like your filter applies on a
single day. So, an index per day?

--
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet | @elasticsearchfr | @scrutmydocs

Le 8 mars 2013 à 11:31, Curt Hu <

zhongting.hu@

> a écrit :

Hi, update gist here:
https://gist.github.com/BlueStalker/5115578

Please tell me if I have missed anything. Great thanks for help.

--
View this message in context:
http://elasticsearch-users.115913.n3.nabble.com/Out-Of-Memory-during-the-terms-facets-tp4031285p4031305.html
Sent from the ElasticSearch Users mailing list archive at Nabble.com.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to

elasticsearch+unsubscribe@

.

For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to

elasticsearch+unsubscribe@

.
For more options, visit https://groups.google.com/groups/opt_out.

--
View this message in context: http://elasticsearch-users.115913.n3.nabble.com/Out-Of-Memory-during-the-terms-facets-tp4031285p4031542.html
Sent from the ElasticSearch Users mailing list archive at Nabble.com.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.