ES locks up and eat the heap

Hey,

So we had this cluster of 3 nodes, 2x32GB high mem/high cpu on ec2 + a
smaller instance. No matter what we did, it would eventually run out of
memory and lock up. So after a week of not sleeping keeping that cluster up
with our bare hands, getting failed shards, etc, we decided to rebuild it
from scratch, add routing, and limit time ranges to the closest hour,
reindexing everything.

It's running now on a single node, 64Gb, the biggest ec2 node basically. It
does the exact same thing: slowly builds up heap until it reaches the
maximum allocated, and then it locks up. Doesn't respond to shutdown, I
always have to kill -9 it, fix indexes with lucene checker and restart it.
The confis is roughly:

index:
store:
type: mmapfs
fs:
mmapfs:
enabled: true
cache:
field:
type: soft
expire: 30s
max_size: 1000
refresh_interval: 60s

bootstrap:
mlockall: true

I've also attached a stacktrace of when it was fully locked up (JSTACK) and
now (JSTACK2). It looks like it spends most of its time indexing. But we
are not indexing that many documents. Maybe 20 a sec. On a machine that
size, it's nothing.

Cluster is 3 indexes, 2x32 + 1x64 shards. Index size now is 100m, 600m,
900m (since we were reindexing). It doesn't take it long to run out of
memory (though it shows no error, just locks up). Which means I basically
have to stare at bigdesk and reboot it when it comes close.

Any help would be greatly appreciated. It's been more than a week of
absolute hell.

--

Attachments.

--

Right now, it eats up 40G in like 10min, which means I have to restart the
node every 10min...

--

I have the same exact problem -- we're using 4x16 gig worker nodes, and 2x4
gig master nodes with 200 million documents. The documents are split up in
a different index per month, 6 indexes total. I can get a single query in,
maybe 2, before one or more nodes becomes completely unresponsive and the
following is spammed to logs:

[2012-10-11 15:16:26,843][INFO ][monitor.jvm ] [Masters,
Alicia] [gc][ConcurrentMarkSweep][572][38] duration [12.4s], collections
[2]/[36.6s], total [12.4s]/[1.6m], memory [12.9gb]->[12.9gb]/[12.9gb],
all_pools {[Code Cache] [3.5mb]->[3.5mb]/[48mb]}{[Par Eden Space]
[133.1mb]->[133.1mb]/[133.1mb]}{[Par Survivor Space]
[8.1mb]->[10.9mb]/[16.6mb]}{[CMS Old Gen] [12.8gb]->[12.8gb]/[12.8gb]}{[CMS
Perm Gen] [36.7mb]->[36.7mb]/[166mb]}

Over and over again, non stop.

We also have a similar setup using 12x16 gig data nodes and 3x8 gig master
nodes, and it does not experience this issue at all. Can elasticsearch
simply not handle 200 million documents split between 6 indexes on 4
servers? Do we need to throw more hardware at it?

--

I got SPM working, and I'm gonna try to switch to one index per tenant
instead of using routing on one big index.

Nonetheless, if someone else has any clue, that'd be deeply appreciated.

--

Also, following up from Julien - our index is tiny. Teensy tiny. Like, it's
dying at just a few gb and less than 1mm records.

--

hey folks,

can you provide me some more information about what you are doing with ES.
I am particularly looking for:

  1. do you facet / sort on string fields, if so how are they indexed
  2. are you bulk indexing and searching at the same time?
  3. if you restart a node is it eating the same amout of memory right away
    after restart
  4. what JVM are you using
  5. what are the startup parameters ie. Xmx Xms
  6. can you trigger the behavior with a certain search query?
  7. did you try to use less shards (50M docs per shard is doable ) just for
    debug reasons?

simon

On Friday, October 12, 2012 4:39:45 AM UTC+2, courtenay wrote:

Also, following up from Julien - our index is tiny. Teensy tiny. Like,
it's dying at just a few gb and less than 1mm records.

--

  1. do you facet / sort on string fields, if so how are they indexed

we mostly search by relevance, but some users search by date. in the old
cluster (that we switched back to for now), dates are indexed to the
second. in the reindex, I switch to 1h resolution. old cluster has 3M
docs over 32 shards.

  1. are you bulk indexing and searching at the same time?

we were when this happened. I stopped bulk indexing and had just live
indexing + search and the memory eating continued. right now I'm
reindexing in bulk on the new machine, with no search. we'll switch when
it's fully reindexed.

  1. if you restart a node is it eating the same amout of memory right away
    after restart

basically yes. behavior was that right after restart, heap would go from
0 to 40gb over 10min, then gc would take some down, but less and less,
and in 20min ES would be locked.

  1. what JVM are you using

this was on 1.6.0.30. it was the only thing available at the moment. we
manually installed 7 and are bulk indexing with 7 on the new one. old is
still at 6u30.

  1. what are the startup parameters ie. Xmx Xms

OLD: -Xms10g -Xmx10g / boxes have 16g.
NEW: -Xms40g -Xmx40g / box has 64g.

  1. can you trigger the behavior with a certain search query?

we didn't really try that yet. to be honest, we have been taking shift
sleeping to keep that thing alive while our users complain. on the old
cluster, I have slow queries activated though. here is one that comes back
often:

{
"query": {
"filtered": {
"query": {
"bool": {
"must": [
{
"text": {
"external": {
"operator": "and",
"query": "new line"
}
}
},
{
"dis_max": {
"queries": [
{
"text": {
"external": {
"operator": "and",
"query": "new line"
}
}
},
{
"text": {
"body": {
"boost": 1,
"query": "new line"
}
}
},
{
"text": {
"title": {
"boost": 5,
"query": "new line"
}
}
}
]
}
}
]
}
},
"filter": {
"and": {
"must": [
{
"term": {
"TENANT": ID
}
},
{
"term": {
"p1": false
}
},
{
"term": {
"p2": false
}
},
{
"term": {
"p3": 1
}
}
]
}
}
}
}
}

  1. did you try to use less shards (50M docs per shard is doable ) just for
    debug reasons?

not yet. right now i'm reindexing with 1 index per tenant (all where on
the same index before), 5 shards per tenant. the behavior was happening
when I tried to keep 1 index but use routing with the tenant id.

Thanks a lot for your help. It's really appreciated :slight_smile:

--

the new machine did not show ANY error message, it would just lock.

the old clusters regularly dies too, but at least we have errors:

===

[05:29:39,923][WARN ][netty.channel.socket.nio.AbstractNioWorker]
Unexpected exception in the selector loop.
java.lang.OutOfMemoryError: Java heap space
[05:29:40,597][WARN ][netty.channel.socket.nio.AbstractNioWorker]
Unexpected exception in the selector loop.
java.lang.OutOfMemoryError: Java heap space
[05:29:51,917][WARN ][transport.netty ] [Wolf] exception caught on
netty layer [[id: 0x7abc8746, /10.142.148.37:48155 => /10.140.66.176:9300]]
org.elasticsearch.common.compress.lzf.LZFException: Corrupt data: overrun
in decompress, input offset 47358, output offset 65536
at
org.elasticsearch.common.compress.lzf.impl.UnsafeChunkDecoder.decodeChunk(UnsafeChunkDecoder.java:120)
at
org.elasticsearch.common.compress.lzf.impl.UnsafeChunkDecoder.decodeChunk(UnsafeChunkDecoder.java:64)
at
org.elasticsearch.common.compress.lzf.LZFCompressedStreamInput.uncompress(LZFCompressedStreamInput.java:57)
at
org.elasticsearch.common.compress.CompressedStreamInput.readyBuffer(CompressedStreamInput.java:160)
at
org.elasticsearch.common.compress.CompressedStreamInput.readByte(CompressedStreamInput.java:81)
at
org.elasticsearch.common.io.stream.HandlesStreamInput.readUTF(HandlesStreamInput.java:46)
at
org.elasticsearch.transport.netty.MessageChannelHandler.handleRequest(MessageChannelHandler.java:198)
at
org.elasticsearch.transport.netty.MessageChannelHandler.messageReceived(MessageChannelHandler.java:111)
at
org.elasticsearch.common.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:75)
at
org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:565)
at
org.elasticsearch.common.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:793)
at
org.elasticsearch.common.netty.channel.Channels.fireMessageReceived(Channels.java:296)
at
org.elasticsearch.common.netty.handler.codec.frame.FrameDecoder.unfoldAndFireMessageReceived(FrameDecoder.java:458)
at
org.elasticsearch.common.netty.handler.codec.frame.FrameDecoder.callDecode(FrameDecoder.java:439)
at
org.elasticsearch.common.netty.handler.codec.frame.FrameDecoder.messageReceived(FrameDecoder.java:311)
at
org.elasticsearch.common.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:75)
at
org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:565)
at
org.elasticsearch.common.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:793)
at
org.elasticsearch.common.netty.OpenChannelsHandler.handleUpstream(OpenChannelsHandler.java:74)
at
org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:565)
at
org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:560)
at
org.elasticsearch.common.netty.channel.Channels.fireMessageReceived(Channels.java:268)
at
org.elasticsearch.common.netty.channel.Channels.fireMessageReceived(Channels.java:255)
at
org.elasticsearch.common.netty.channel.socket.nio.NioWorker.read(NioWorker.java:84)
at
org.elasticsearch.common.netty.channel.socket.nio.AbstractNioWorker.processSelectedKeys(AbstractNioWorker.java:471)
at
org.elasticsearch.common.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:332)
at
org.elasticsearch.common.netty.channel.socket.nio.NioWorker.run(NioWorker.java:35)
at
org.elasticsearch.common.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:102)
at
org.elasticsearch.common.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
[05:30:03,595][WARN ][transport.netty ] [Wolf] Message not fully
read (request) for [6242] and action [index], resetting
[05:30:03,609][WARN ][transport.netty ] [Wolf] Message not fully
read (request) for [18212] and action [index/shard/recovery/fileChunk],
resetting

===

At this point, I assume some of our shards are corrupted because we've had
to kill -9 ES many times (would not respond to anything else). So I kinda
know we'll have to switch to the rebuilt index anyway.

--

On Friday, October 12, 2012 2:28:52 PM UTC+2, Julien wrote:

  1. do you facet / sort on string fields, if so how are they indexed

we mostly search by relevance, but some users search by date. in the old
cluster (that we switched back to for now), dates are indexed to the
second. in the reindex, I switch to 1h resolution. old cluster has 3M
docs over 32 shards.

  1. are you bulk indexing and searching at the same time?

we were when this happened. I stopped bulk indexing and had just live
indexing + search and the memory eating continued. right now I'm
reindexing in bulk on the new machine, with no search. we'll switch when
it's fully reindexed.

  1. if you restart a node is it eating the same amout of memory right away
    after restart

basically yes. behavior was that right after restart, heap would go from
0 to 40gb over 10min, then gc would take some down, but less and less,
and in 20min ES would be locked.

  1. what JVM are you using

this was on 1.6.0.30. it was the only thing available at the moment. we
manually installed 7 and are bulk indexing with 7 on the new one. old is
still at 6u30.

  1. what are the startup parameters ie. Xmx Xms

OLD: -Xms10g -Xmx10g / boxes have 16g.
NEW: -Xms40g -Xmx40g / box has 64g.

this makes me curious. Do you really need that much memory in the JVM if
you don't do sorting / faceting? I mean this makes almost no sense to me.
If you go above 30GB you don't get compressed objpointers anymore and use
64bit for objects. I'd be very curious what happens if you give it less
than 30GB? Any reason why you allocate sooo much memory in the JVM?

simon

  1. can you trigger the behavior with a certain search query?

we didn't really try that yet. to be honest, we have been taking shift
sleeping to keep that thing alive while our users complain. on the old
cluster, I have slow queries activated though. here is one that comes back
often:

{
"query": {
"filtered": {
"query": {
"bool": {
"must": [
{
"text": {
"external": {
"operator": "and",
"query": "new line"
}
}
},
{
"dis_max": {
"queries": [
{
"text": {
"external": {
"operator": "and",
"query": "new line"
}
}
},
{
"text": {
"body": {
"boost": 1,
"query": "new line"
}
}
},
{
"text": {
"title": {
"boost": 5,
"query": "new line"
}
}
}
]
}
}
]
}
},
"filter": {
"and": {
"must": [
{
"term": {
"TENANT": ID
}
},
{
"term": {
"p1": false
}
},
{
"term": {
"p2": false
}
},
{
"term": {
"p3": 1
}
}
]
}
}
}
}
}

  1. did you try to use less shards (50M docs per shard is doable ) just for
    debug reasons?

not yet. right now i'm reindexing with 1 index per tenant (all where on
the same index before), 5 shards per tenant. the behavior was happening
when I tried to keep 1 index but use routing with the tenant id.

Thanks a lot for your help. It's really appreciated :slight_smile:

--

Agreed with Simon, if you are not using faceting/sorting, then there isn't really a reason that it will use that much memory (unless you are using a lot of indices/shards).

One more question, can you try not to set bootstrap.mlockall? Maybe it doesn't play well when you use mmapfs.

Another reason why this might happen is that you are sending big bulk requests, make sure you break them up to manageable size bulk requests if you don't.

On Oct 12, 2012, at 8:39 AM, simonw simon.willnauer@elasticsearch.com wrote:

On Friday, October 12, 2012 2:28:52 PM UTC+2, Julien wrote:

  1. do you facet / sort on string fields, if so how are they indexed

we mostly search by relevance, but some users search by date. in the old
cluster (that we switched back to for now), dates are indexed to the
second. in the reindex, I switch to 1h resolution. old cluster has 3M
docs over 32 shards.

  1. are you bulk indexing and searching at the same time?

we were when this happened. I stopped bulk indexing and had just live
indexing + search and the memory eating continued. right now I'm
reindexing in bulk on the new machine, with no search. we'll switch when
it's fully reindexed.

  1. if you restart a node is it eating the same amout of memory right away after restart

basically yes. behavior was that right after restart, heap would go from
0 to 40gb over 10min, then gc would take some down, but less and less,
and in 20min ES would be locked.

  1. what JVM are you using

this was on 1.6.0.30. it was the only thing available at the moment. we
manually installed 7 and are bulk indexing with 7 on the new one. old is
still at 6u30.

  1. what are the startup parameters ie. Xmx Xms

OLD: -Xms10g -Xmx10g / boxes have 16g.
NEW: -Xms40g -Xmx40g / box has 64g.

this makes me curious. Do you really need that much memory in the JVM if you don't do sorting / faceting? I mean this makes almost no sense to me. If you go above 30GB you don't get compressed objpointers anymore and use 64bit for objects. I'd be very curious what happens if you give it less than 30GB? Any reason why you allocate sooo much memory in the JVM?

simon
6. can you trigger the behavior with a certain search query?

we didn't really try that yet. to be honest, we have been taking shift
sleeping to keep that thing alive while our users complain. on the old
cluster, I have slow queries activated though. here is one that comes back often:

{
"query": {
"filtered": {
"query": {
"bool": {
"must": [
{
"text": {
"external": {
"operator": "and",
"query": "new line"
}
}
},
{
"dis_max": {
"queries": [
{
"text": {
"external": {
"operator": "and",
"query": "new line"
}
}
},
{
"text": {
"body": {
"boost": 1,
"query": "new line"
}
}
},
{
"text": {
"title": {
"boost": 5,
"query": "new line"
}
}
}
]
}
}
]
}
},
"filter": {
"and": {
"must": [
{
"term": {
"TENANT": ID
}
},
{
"term": {
"p1": false
}
},
{
"term": {
"p2": false
}
},
{
"term": {
"p3": 1
}
}
]
}
}
}
}
}

  1. did you try to use less shards (50M docs per shard is doable ) just for debug reasons?

not yet. right now i'm reindexing with 1 index per tenant (all where on
the same index before), 5 shards per tenant. the behavior was happening
when I tried to keep 1 index but use routing with the tenant id.

Thanks a lot for your help. It's really appreciated :slight_smile:

--

--

We used that much memory because on the cluster with 3 nodes it would run
out of heap on every memory size we gave it. This was with no bulk
indexing, just our regular 3 nodes cluster with a 3M docs index, and maybe
10 indexing requests per sec, probably less. So after several days of
keeping it up, we said: let's give it the biggest machine we can give it so
we can finally get some rest.

Right now, it's bulk indexing on the new machine (search and live indexing
still go to the old cluster) and memory is under 1gb all the time, like
300m, 500m, 600m.

I'll try to not use mlockall. I have to wait until the bulk indexing
finishes. Would you recommend in general using nio or mmapfs?

Also when it was bulk indexing, it would happen no matter what the batch
size was: 1000, 500, 100, 50, 20.

The old cluster is still running out of heap, but it continues to work, at
least well enough to serve most of the requests. Frankly, I have no idea
why since it's been giving fit for days every hour and we haven't touched
it.

--

One other thing that I though about that you might be doing, is ask for 100,000 hits back (or something similar size wise). In this case, all the request needs to be represented in memory. Not saying that you do :), just throwing ideas around...

On Oct 12, 2012, at 8:57 AM, Julien calexicoz@gmail.com wrote:

We used that much memory because on the cluster with 3 nodes it would run out of heap on every memory size we gave it. This was with no bulk indexing, just our regular 3 nodes cluster with a 3M docs index, and maybe 10 indexing requests per sec, probably less. So after several days of keeping it up, we said: let's give it the biggest machine we can give it so we can finally get some rest.

Right now, it's bulk indexing on the new machine (search and live indexing still go to the old cluster) and memory is under 1gb all the time, like 300m, 500m, 600m.

I'll try to not use mlockall. I have to wait until the bulk indexing finishes. Would you recommend in general using nio or mmapfs?

Also when it was bulk indexing, it would happen no matter what the batch size was: 1000, 500, 100, 50, 20.

The old cluster is still running out of heap, but it continues to work, at least well enough to serve most of the requests. Frankly, I have no idea why since it's been giving fit for days every hour and we haven't touched it.

--

--

we already ask for 50 hits only.

--

  1. do you facet / sort on string fields, if so how are they indexed

Yes and yes. Here is the mapping for a single month index:

{
"eventlog_2012_3" : {
"csm" : {
"_all" : {
"enabled" : false
},
"_source" : {
"enabled" : false
},
"properties" : {
"body" : {
"type" : "string"
},
"class" : {
"type" : "string",
"index" : "not_analyzed"
},
"class_id" : {
"type" : "string"
},
"dt" : {
"type" : "date",
"format" : "dateOptionalTime"
},
"id" : {
"type" : "string",
"index" : "not_analyzed"
},
"instance" : {
"type" : "string",
"index" : "not_analyzed"
},
"label" : {
"type" : "string",
"index" : "not_analyzed"
},
"level" : {
"type" : "string",
"index" : "not_analyzed"
},
"other_id" : {
"type" : "string"
},
"otherclass" : {
"type" : "string"
},
"subdomain" : {
"type" : "string",
"index" : "not_analyzed"
},
"test" : {
"type" : "string"
},
"user" : {
"type" : "string",
"index" : "not_analyzed"
}
}
}
}
}

  1. are you bulk indexing and searching at the same time?

No. Straight up searching after all data has been indexed.

  1. if you restart a node is it eating the same amout of memory right away
    after restart

Undetermined -- I normally have to restart the entire cluster for recovery
to happen, I assume because other nodes are spinning out on this query.

  1. what JVM are you using

java-1.6.0-openjdk-1.6.0.0-1.49.1.11.4.el6_3.x86_64

  1. what are the startup parameters ie. Xmx Xms

java -Xms13g -Xmx13g -Xss256k -Djava.awt.headless=true -XX:+UseParNewGC
-XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=75
-XX:+UseCMSInitiatingOccupancyOnly -XX:+HeapDumpOnOutOfMemoryError
-Delasticsearch -Des.foreground=yes -Des.path.home=/usr/local/elasticsearch
-cp
:/usr/local/elasticsearch/lib/elasticsearch-0.19.10.jar:/usr/local/elasticsearch/lib/:/usr/local/elasticsearch/lib/sigar/
org.elasticsearch.bootstrap.Elasticsearch

  1. can you trigger the behavior with a certain search query?

Yes, after it is run 2-3 times across every index (eventlog_2012_03, 04,
05, after it gets to 06 or so, a node locks up) as follows:

query = {
:query => {
:term => {
:instance => "bogus-data"
}
},
:sort => {
:dt => "desc",
:id => "desc",
:label => "asc",
:class => "asc",
:user => "asc",
:subdomain => "asc",
:level => "asc"
},
:filter => {
:and => {
:filters => [{
:range => {
:dt => {
:from => startDate,
:to => endDate
}
}
}],
:_cache => true
}
}
:size => 50,
:facets => {
:label => {
:terms => {
:field => "label"
}
}
}
}

  1. did you try to use less shards (50M docs per shard is doable ) just for

debug reasons?

No. We have 5 shards per index which in turn, holds ~(200m / 5)
documents each. Here is the index setting:

{
"eventlog_2012_3" : {
"settings" : {
"index.query.default_field" : "body",
"index.version.created" : "190899",
"index.number_of_replicas" : "2",
"index.number_of_shards" : "5",
"index.merge.max_merged_segment" : "50gb"
}
}
}

--

On Friday, October 12, 2012 6:05:15 PM UTC+2, Antonio Lobato wrote:

  1. do you facet / sort on string fields, if so how are they indexed

Yes and yes. Here is the mapping for a single month index:

{
"eventlog_2012_3" : {
"csm" : {
"_all" : {
"enabled" : false
},
"_source" : {
"enabled" : false
},
"properties" : {
"body" : {
"type" : "string"
},
"class" : {
"type" : "string",
"index" : "not_analyzed"
},
"class_id" : {
"type" : "string"
},
"dt" : {
"type" : "date",
"format" : "dateOptionalTime"
},
"id" : {
"type" : "string",
"index" : "not_analyzed"
},
"instance" : {
"type" : "string",
"index" : "not_analyzed"
},
"label" : {
"type" : "string",
"index" : "not_analyzed"
},
"level" : {
"type" : "string",
"index" : "not_analyzed"
},
"other_id" : {
"type" : "string"
},
"otherclass" : {
"type" : "string"
},
"subdomain" : {
"type" : "string",
"index" : "not_analyzed"
},
"test" : {
"type" : "string"
},
"user" : {
"type" : "string",
"index" : "not_analyzed"
}
}
}
}
}

ok that makes things trickier!

  1. are you bulk indexing and searching at the same time?

No. Straight up searching after all data has been indexed.

  1. if you restart a node is it eating the same amout of memory right away
    after restart

Undetermined -- I normally have to restart the entire cluster for recovery
to happen, I assume because other nodes are spinning out on this query.

  1. what JVM are you using

java-1.6.0-openjdk-1.6.0.0-1.49.1.11.4.el6_3.x86_64

  1. what are the startup parameters ie. Xmx Xms

java -Xms13g -Xmx13g -Xss256k -Djava.awt.headless=true -XX:+UseParNewGC
-XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=75
-XX:+UseCMSInitiatingOccupancyOnly -XX:+HeapDumpOnOutOfMemoryError
-Delasticsearch -Des.foreground=yes -Des.path.home=/usr/local/elasticsearch
-cp
:/usr/local/elasticsearch/lib/elasticsearch-0.19.10.jar:/usr/local/elasticsearch/lib/:/usr/local/elasticsearch/lib/sigar/
org.elasticsearch.bootstrap.Elasticsearch

  1. can you trigger the behavior with a certain search query?

Yes, after it is run 2-3 times across every index (eventlog_2012_03, 04,
05, after it gets to 06 or so, a node locks up) as follows:

query = {
:query => {
:term => {
:instance => "bogus-data"
}
},
:sort => {
:dt => "desc",
:id => "desc",
:label => "asc",
:class => "asc",
:user => "asc",
:subdomain => "asc",
:level => "asc"
},
:filter => {
:and => {
:filters => [{
:range => {
:dt => {
:from => startDate,
:to => endDate
}
}
}],
:_cache => true
}
}
:size => 50,
:facets => {
:label => {
:terms => {
:field => "label"
}
}
}
}

ok, given this query I am not surprised you are running OOM here. You
sorting based on 7 Fields and most of them are "not_analyzed" so you are
essentially keeping you entire document collection in JVM memory. My
question is why do you need to sort by all these fields, can you live with
dt and id only? If so can you express your ID as a number ie. use the _id
field? In ES if you sort every value is loaded into memory and that quickly
brings you to the limits of your machines (same is true for faceting). You
should try to reduce the values in memory ie. remove some of the sort
fields!

simon

  1. did you try to use less shards (50M docs per shard is doable ) just for

debug reasons?

No. We have 5 shards per index which in turn, holds ~(200m / 5)
documents each. Here is the index setting:

{
"eventlog_2012_3" : {
"settings" : {
"index.query.default_field" : "body",
"index.version.created" : "190899",
"index.number_of_replicas" : "2",
"index.number_of_shards" : "5",
"index.merge.max_merged_segment" : "50gb"
}
}
}

--

Hey,

Just wanted to post a quick follow up on my side. We deleted that index and
recreated a new one on a new machine, no change in config. I still have no
clue what happened. Everything works perfectly now, which is both
reassuring and scary.

--

you mean you changed nothing, not even the slightest thing? I have seen
such an issue on EC2 recently for an unrelated application and we didn't
figure out what happened. Can you provide some information about your nodes
OS, ES instance etc?

simon

On Tuesday, October 16, 2012 12:33:49 AM UTC+2, Julien wrote:

Hey,

Just wanted to post a quick follow up on my side. We deleted that index
and recreated a new one on a new machine, no change in config. I still have
no clue what happened. Everything works perfectly now, which is both
reassuring and scary.

--