After upgrade to .16, problems

Hi,

I upgraded to .16 and I am having a multitude of issues. The biggest
seems to be during inserts:

org.elasticsearch.index.engine.EngineClosedException: [documents][3]
CurrentState[CLOSED]
at
org.elasticsearch.index.engine.robin.RobinEngine.create(RobinEngine.java:
264)
at
org.elasticsearch.index.shard.service.InternalIndexShard.create(InternalIndexShard.java:
272)
at
org.elasticsearch.action.bulk.TransportShardBulkAction.shardOperationOnPrimary(TransportShardBulkAction.java:
136)
at
org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction
$AsyncShardOperationAction.performOnPrimary(TransportShardReplicationOperationAction.java:
418)
at
org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction
$AsyncShardOperationAction.access
$100(TransportShardReplicationOperationAction.java:233)
at
org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction
$AsyncShardOperationAction
$1.run(TransportShardReplicationOperationAction.java:331)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:
1110)
at java.util.concurrent.ThreadPoolExecutor
$Worker.run(ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.java:636)
Caused by: java.lang.OutOfMemoryError: Java heap space
at
org.elasticsearch.common.compress.lzf.BufferRecycler.allocEncodingBuffer(BufferRecycler.java:
76)
at
org.elasticsearch.common.compress.lzf.ChunkEncoder.(ChunkEncoder.java:
69)
at
org.elasticsearch.common.io.stream.LZFStreamOutput.(LZFStreamOutput.java:
46)
at org.elasticsearch.common.io.stream.CachedStreamOutput
$1.initialValue(CachedStreamOutput.java:47)
at org.elasticsearch.common.io.stream.CachedStreamOutput
$1.initialValue(CachedStreamOutput.java:43)
at java.lang.ThreadLocal.setInitialValue(ThreadLocal.java:160)
at java.lang.ThreadLocal.get(ThreadLocal.java:150)
at
org.elasticsearch.common.io.stream.CachedStreamOutput.cachedBytes(CachedStreamOutput.java:
56)
at org.elasticsearch.index.translog.fs.FsTranslog.add(FsTranslog.java:
159)
at
org.elasticsearch.index.engine.robin.RobinEngine.innerCreate(RobinEngine.java:
361)
at
org.elasticsearch.index.engine.robin.RobinEngine.create(RobinEngine.java:
266)
... 8 more

We are also noticing really high CPU usage, 400% (there are 4 cores)
on the Ubuntu machine. Is there a way to query the cluster and find
out what operations it is working on? Here is the cluster health (the
index we use is called documents):

{"cluster_name":"viralheat","master_node":"mjQhyNydTZycnLZseq8lHQ","blocks":
{},"nodes":{"mjQhyNydTZycnLZseq8lHQ":
{"name":"Alchemy","transport_address":"inet[/
192.168.8.230:9300]","attributes":{}}},"metadata":{"templates":
{},"indices":{"twitter":{"state":"open","settings":
{"index.number_of_shards":"5","index.number_of_replicas":"1"},"mappings":
{},"aliases":[]},"documents":{"state":"open","settings":
{"index.number_of_replicas":"0","index.number_of_shards":"5"},"mappings":
{"document":{"properties":{"tags":{"type":"string"},"platform":
{"type":"string"},"utimestamp":{"type":"string"},"document":
{"type":"string"},"created_at":
{"format":"dateOptionalTime","type":"date"},"record_id":
{"type":"string"}}},"documents":{"properties":{"tags":
{"type":"string"},"platform":{"type":"string"},"utimestamp":
{"type":"long"},"document":{"type":"string"},"created_at":
{"format":"dateOptionalTime","type":"date"},"record_id":
{"type":"string"}}}},"aliases":[]}}},"routing_table":{"indices":
{"twitter":{"shards":{"0":
[{"state":"STARTED","primary":true,"node":"mjQhyNydTZycnLZseq8lHQ","relocating_node":null,"shard":
0,"index":"twitter"},
{"state":"UNASSIGNED","primary":false,"node":null,"relocating_node":null,"shard":
0,"index":"twitter"}],"1":
[{"state":"STARTED","primary":true,"node":"mjQhyNydTZycnLZseq8lHQ","relocating_node":null,"shard":
1,"index":"twitter"},
{"state":"UNASSIGNED","primary":false,"node":null,"relocating_node":null,"shard":
1,"index":"twitter"}],"2":
[{"state":"STARTED","primary":true,"node":"mjQhyNydTZycnLZseq8lHQ","relocating_node":null,"shard":
2,"index":"twitter"},
{"state":"UNASSIGNED","primary":false,"node":null,"relocating_node":null,"shard":
2,"index":"twitter"}],"3":
[{"state":"STARTED","primary":true,"node":"mjQhyNydTZycnLZseq8lHQ","relocating_node":null,"shard":
3,"index":"twitter"},
{"state":"UNASSIGNED","primary":false,"node":null,"relocating_node":null,"shard":
3,"index":"twitter"}],"4":
[{"state":"STARTED","primary":true,"node":"mjQhyNydTZycnLZseq8lHQ","relocating_node":null,"shard":
4,"index":"twitter"},
{"state":"UNASSIGNED","primary":false,"node":null,"relocating_node":null,"shard":
4,"index":"twitter"}]}},"documents":{"shards":{"0":
[{"state":"UNASSIGNED","primary":true,"node":null,"relocating_node":null,"shard":
0,"index":"documents"}],"1":
[{"state":"STARTED","primary":true,"node":"mjQhyNydTZycnLZseq8lHQ","relocating_node":null,"shard":
1,"index":"documents"}],"2":
[{"state":"UNASSIGNED","primary":true,"node":null,"relocating_node":null,"shard":
2,"index":"documents"}],"3":
[{"state":"STARTED","primary":true,"node":"mjQhyNydTZycnLZseq8lHQ","relocating_node":null,"shard":
3,"index":"documents"}],"4":
[{"state":"STARTED","primary":true,"node":"mjQhyNydTZycnLZseq8lHQ","relocating_node":null,"shard":
4,"index":"documents"}]}}}},"routing_nodes":{"unassigned":
[{"state":"UNASSIGNED","primary":false,"node":null,"relocating_node":null,"shard":
0,"index":"twitter"},
{"state":"UNASSIGNED","primary":false,"node":null,"relocating_node":null,"shard":
1,"index":"twitter"},
{"state":"UNASSIGNED","primary":false,"node":null,"relocating_node":null,"shard":
2,"index":"twitter"},
{"state":"UNASSIGNED","primary":false,"node":null,"relocating_node":null,"shard":
3,"index":"twitter"},
{"state":"UNASSIGNED","primary":false,"node":null,"relocating_node":null,"shard":
4,"index":"twitter"},
{"state":"UNASSIGNED","primary":true,"node":null,"relocating_node":null,"shard":
0,"index":"documents"},
{"state":"UNASSIGNED","primary":true,"node":null,"relocating_node":null,"shard":
2,"index":"documents"}],"nodes":{"mjQhyNydTZycnLZseq8lHQ":
[{"state":"STARTED","primary":true,"node":"mjQhyNydTZycnLZseq8lHQ","relocating_node":null,"shard":
0,"index":"twitter"},
{"state":"STARTED","primary":true,"node":"mjQhyNydTZycnLZseq8lHQ","relocating_node":null,"shard":
1,"index":"twitter"},
{"state":"STARTED","primary":true,"node":"mjQhyNydTZycnLZseq8lHQ","relocating_node":null,"shard":
2,"index":"twitter"},
{"state":"STARTED","primary":true,"node":"mjQhyNydTZycnLZseq8lHQ","relocating_node":null,"shard":
3,"index":"twitter"},
{"state":"STARTED","primary":true,"node":"mjQhyNydTZycnLZseq8lHQ","relocating_node":null,"shard":
4,"index":"twitter"},
{"state":"STARTED","primary":true,"node":"mjQhyNydTZycnLZseq8lHQ","relocating_node":null,"shard":
1,"index":"documents"},
{"state":"STARTED","primary":true,"node":"mjQhyNydTZycnLZseq8lHQ","relocating_node":null,"shard":
3,"index":"documents"},
{"state":"STARTED","primary":true,"node":"mjQhyNydTZycnLZseq8lHQ","relocating_node":null,"shard":
4,"index":"documents"}]}},"allocations":[]}

One more thing, I also noticed one of my nodes is gone. I had two
machines in the cluster. For some odd reason, it's not discovering the
231 box:

{"cluster_name":"x","nodes":{"mjQhyNydTZycnLZseq8lHQ":
{"name":"Alchemy","transport_address":"inet[/
192.168.8.230:9300]","attributes":{},"http_address":"inet[/
192.168.8.230:9200]","os":{"refresh_interval":5000,"cpu":
{"vendor":"AMD","model":"Dual Core AMD Opteron(tm) Processor
275","mhz":2209,"total_cores":4,"total_sockets":2,"cores_per_socket":
2,"cache_size":"1kb","cache_size_in_bytes":1024},"mem":
{"total":"14.9gb","total_in_bytes":16070246400},"swap":
{"total":"37.6gb","total_in_bytes":40470831104}},"process":
{"refresh_interval":5000,"id":2298},"jvm":{"pid":
2298,"version":"1.6.0_20","vm_name":"OpenJDK 64-Bit Server
VM","vm_version":"19.0-b09","vm_vendor":"Sun Microsystems
Inc.","start_time":1303806081608,"mem":
{"heap_init":"256mb","heap_init_in_bytes":
268435456,"heap_max":"1015.6mb","heap_max_in_bytes":
1065025536,"non_heap_init":"23.1mb","non_heap_init_in_bytes":
24313856,"non_heap_max":"214mb","non_heap_max_in_bytes":
224395264}},"network":{"refresh_interval":5000,"primary_interface":
{"address":"192.168.8.230","name":"eth1","mac_address":"00:E0:81:41:AF:
7F"}},"transport":{"bound_address":"inet[/
192.168.8.230:9300]","publish_address":"inet[/192.168.8.230:9300]"}}}}

It looks like one of your nodes ran out of memory:

java.lang.OutOfMemoryError: Java heap space

On Tue, Apr 26, 2011 at 12:12 PM, electic electic@gmail.com wrote:

One more thing, I also noticed one of my nodes is gone. I had two
machines in the cluster. For some odd reason, it's not discovering the
231 box:

{"cluster_name":"x","nodes":{"mjQhyNydTZycnLZseq8lHQ":
{"name":"Alchemy","transport_address":"inet[/
192.168.8.230:9300]","attributes":{},"http_address":"inet[/
192.168.8.230:9200]","os":{"refresh_interval":5000,"cpu":
{"vendor":"AMD","model":"Dual Core AMD Opteron(tm) Processor
275","mhz":2209,"total_cores":4,"total_sockets":2,"cores_per_socket":
2,"cache_size":"1kb","cache_size_in_bytes":1024},"mem":
{"total":"14.9gb","total_in_bytes":16070246400},"swap":
{"total":"37.6gb","total_in_bytes":40470831104}},"process":
{"refresh_interval":5000,"id":2298},"jvm":{"pid":
2298,"version":"1.6.0_20","vm_name":"OpenJDK 64-Bit Server
VM","vm_version":"19.0-b09","vm_vendor":"Sun Microsystems
Inc.","start_time":1303806081608,"mem":
{"heap_init":"256mb","heap_init_in_bytes":
268435456,"heap_max":"1015.6mb","heap_max_in_bytes":
1065025536,"non_heap_init":"23.1mb","non_heap_init_in_bytes":
24313856,"non_heap_max":"214mb","non_heap_max_in_bytes":
224395264}},"network":{"refresh_interval":5000,"primary_interface":
{"address":"192.168.8.230","name":"eth1","mac_address":"00:E0:81:41:AF:
7F"}},"transport":{"bound_address":"inet[/
192.168.8.230:9300]","publish_address":"inet[/192.168.8.230:9300]"}}}}

That's interesting. As I just turned it on for about an hour after I
upgraded it. Here is the box memory usage:

Mem: 15693600k total, 6643536k used, 9050064k free, 73648k
buffers

is there anyway to solve this?

On Apr 26, 9:17 am, Igor Motov igor.mo...@sonian.net wrote:

It looks like one of your nodes ran out of memory:

java.lang.OutOfMemoryError: Java heap space

On Tue, Apr 26, 2011 at 12:12 PM, electic elec...@gmail.com wrote:

One more thing, I also noticed one of my nodes is gone. I had two
machines in the cluster. For some odd reason, it's not discovering the
231 box:

{"cluster_name":"x","nodes":{"mjQhyNydTZycnLZseq8lHQ":
{"name":"Alchemy","transport_address":"inet[/
192.168.8.230:9300]","attributes":{},"http_address":"inet[/
192.168.8.230:9200]","os":{"refresh_interval":5000,"cpu":
{"vendor":"AMD","model":"Dual Core AMD Opteron(tm) Processor
275","mhz":2209,"total_cores":4,"total_sockets":2,"cores_per_socket":
2,"cache_size":"1kb","cache_size_in_bytes":1024},"mem":
{"total":"14.9gb","total_in_bytes":16070246400},"swap":
{"total":"37.6gb","total_in_bytes":40470831104}},"process":
{"refresh_interval":5000,"id":2298},"jvm":{"pid":
2298,"version":"1.6.0_20","vm_name":"OpenJDK 64-Bit Server
VM","vm_version":"19.0-b09","vm_vendor":"Sun Microsystems
Inc.","start_time":1303806081608,"mem":
{"heap_init":"256mb","heap_init_in_bytes":
268435456,"heap_max":"1015.6mb","heap_max_in_bytes":
1065025536,"non_heap_init":"23.1mb","non_heap_init_in_bytes":
24313856,"non_heap_max":"214mb","non_heap_max_in_bytes":
224395264}},"network":{"refresh_interval":5000,"primary_interface":
{"address":"192.168.8.230","name":"eth1","mac_address":"00:E0:81:41:AF:
7F"}},"transport":{"bound_address":"inet[/
192.168.8.230:9300]","publish_address":"inet[/192.168.8.230:9300]"}}}}

What about jvm memory usage? What does jvm section of curl -XGET '
http://localhost:9200/_cluster/nodes/stats?pretty=true' contain and how does
it compare to what is specified in -Xmx parameter of elasticsearch process?

On Tue, Apr 26, 2011 at 12:21 PM, electic electic@gmail.com wrote:

That's interesting. As I just turned it on for about an hour after I
upgraded it. Here is the box memory usage:

Mem: 15693600k total, 6643536k used, 9050064k free, 73648k
buffers

is there anyway to solve this?

On Apr 26, 9:17 am, Igor Motov igor.mo...@sonian.net wrote:

It looks like one of your nodes ran out of memory:

java.lang.OutOfMemoryError: Java heap space

On Tue, Apr 26, 2011 at 12:12 PM, electic elec...@gmail.com wrote:

One more thing, I also noticed one of my nodes is gone. I had two
machines in the cluster. For some odd reason, it's not discovering the
231 box:

{"cluster_name":"x","nodes":{"mjQhyNydTZycnLZseq8lHQ":
{"name":"Alchemy","transport_address":"inet[/
192.168.8.230:9300]","attributes":{},"http_address":"inet[/
192.168.8.230:9200]","os":{"refresh_interval":5000,"cpu":
{"vendor":"AMD","model":"Dual Core AMD Opteron(tm) Processor
275","mhz":2209,"total_cores":4,"total_sockets":2,"cores_per_socket":
2,"cache_size":"1kb","cache_size_in_bytes":1024},"mem":
{"total":"14.9gb","total_in_bytes":16070246400},"swap":
{"total":"37.6gb","total_in_bytes":40470831104}},"process":
{"refresh_interval":5000,"id":2298},"jvm":{"pid":
2298,"version":"1.6.0_20","vm_name":"OpenJDK 64-Bit Server
VM","vm_version":"19.0-b09","vm_vendor":"Sun Microsystems
Inc.","start_time":1303806081608,"mem":
{"heap_init":"256mb","heap_init_in_bytes":
268435456,"heap_max":"1015.6mb","heap_max_in_bytes":
1065025536,"non_heap_init":"23.1mb","non_heap_init_in_bytes":
24313856,"non_heap_max":"214mb","non_heap_max_in_bytes":
224395264}},"network":{"refresh_interval":5000,"primary_interface":
{"address":"192.168.8.230","name":"eth1","mac_address":"00:E0:81:41:AF:
7F"}},"transport":{"bound_address":"inet[/
192.168.8.230:9300]","publish_address":"inet[/192.168.8.230:9300]"}}}}

We are also noticing really high CPU usage,
400% (there are 4 cores) on the Ubuntu machine.

When does your CPU use 400%? Directly after the start or some hours
after indexing?

I had hard problems (0.16.0 snapshot) discribed here:

http://groups.google.com/a/elasticsearch.com/group/users/browse_thread/thread/beb060bf58a2d1df

which is now partially solved: the OS did not crash anymore, thread
count is now normal (~70),
the node is responsive, but still high CPU usage (in our case 700%)!
After a restart of ES the node takes <= 50% of CPU and it somehow goes
up after some hours.
Now I need to find out when (my feeling was that it missed out some
optimizations and goes up due to missing optimization ... but I'm
still investigating)

Regards,
Peter.

On Apr 26, 6:21 pm, electic elec...@gmail.com wrote:

That's interesting. As I just turned it on for about an hour after I
upgraded it. Here is the box memory usage:

Mem: 15693600k total, 6643536k used, 9050064k free, 73648k
buffers

is there anyway to solve this?

On Apr 26, 9:17 am, Igor Motov igor.mo...@sonian.net wrote:

It looks like one of your nodes ran out of memory:

java.lang.OutOfMemoryError: Java heap space

On Tue, Apr 26, 2011 at 12:12 PM, electic elec...@gmail.com wrote:

One more thing, I also noticed one of my nodes is gone. I had two
machines in the cluster. For some odd reason, it's not discovering the
231 box:

{"cluster_name":"x","nodes":{"mjQhyNydTZycnLZseq8lHQ":
{"name":"Alchemy","transport_address":"inet[/
192.168.8.230:9300]","attributes":{},"http_address":"inet[/
192.168.8.230:9200]","os":{"refresh_interval":5000,"cpu":
{"vendor":"AMD","model":"Dual Core AMD Opteron(tm) Processor
275","mhz":2209,"total_cores":4,"total_sockets":2,"cores_per_socket":
2,"cache_size":"1kb","cache_size_in_bytes":1024},"mem":
{"total":"14.9gb","total_in_bytes":16070246400},"swap":
{"total":"37.6gb","total_in_bytes":40470831104}},"process":
{"refresh_interval":5000,"id":2298},"jvm":{"pid":
2298,"version":"1.6.0_20","vm_name":"OpenJDK 64-Bit Server
VM","vm_version":"19.0-b09","vm_vendor":"Sun Microsystems
Inc.","start_time":1303806081608,"mem":
{"heap_init":"256mb","heap_init_in_bytes":
268435456,"heap_max":"1015.6mb","heap_max_in_bytes":
1065025536,"non_heap_init":"23.1mb","non_heap_init_in_bytes":
24313856,"non_heap_max":"214mb","non_heap_max_in_bytes":
224395264}},"network":{"refresh_interval":5000,"primary_interface":
{"address":"192.168.8.230","name":"eth1","mac_address":"00:E0:81:41:AF:
7F"}},"transport":{"bound_address":"inet[/
192.168.8.230:9300]","publish_address":"inet[/192.168.8.230:9300]"}}}}

But check RAM usage first and how long the GC takes
or if you have problems with merging.

e.g. use jvisualvm, node info, thread dumps (kill -3
) etc

On Apr 26, 6:21 pm, electic elec...@gmail.com wrote:

That's interesting. As I just turned it on for about an hour after I
upgraded it. Here is the box memory usage:

Mem: 15693600k total, 6643536k used, 9050064k free, 73648k
buffers

is there anyway to solve this?

On Apr 26, 9:17 am, Igor Motov igor.mo...@sonian.net wrote:

It looks like one of your nodes ran out of memory:

java.lang.OutOfMemoryError: Java heap space

On Tue, Apr 26, 2011 at 12:12 PM, electic elec...@gmail.com wrote:

One more thing, I also noticed one of my nodes is gone. I had two
machines in the cluster. For some odd reason, it's not discovering the
231 box:

{"cluster_name":"x","nodes":{"mjQhyNydTZycnLZseq8lHQ":
{"name":"Alchemy","transport_address":"inet[/
192.168.8.230:9300]","attributes":{},"http_address":"inet[/
192.168.8.230:9200]","os":{"refresh_interval":5000,"cpu":
{"vendor":"AMD","model":"Dual Core AMD Opteron(tm) Processor
275","mhz":2209,"total_cores":4,"total_sockets":2,"cores_per_socket":
2,"cache_size":"1kb","cache_size_in_bytes":1024},"mem":
{"total":"14.9gb","total_in_bytes":16070246400},"swap":
{"total":"37.6gb","total_in_bytes":40470831104}},"process":
{"refresh_interval":5000,"id":2298},"jvm":{"pid":
2298,"version":"1.6.0_20","vm_name":"OpenJDK 64-Bit Server
VM","vm_version":"19.0-b09","vm_vendor":"Sun Microsystems
Inc.","start_time":1303806081608,"mem":
{"heap_init":"256mb","heap_init_in_bytes":
268435456,"heap_max":"1015.6mb","heap_max_in_bytes":
1065025536,"non_heap_init":"23.1mb","non_heap_init_in_bytes":
24313856,"non_heap_max":"214mb","non_heap_max_in_bytes":
224395264}},"network":{"refresh_interval":5000,"primary_interface":
{"address":"192.168.8.230","name":"eth1","mac_address":"00:E0:81:41:AF:
7F"}},"transport":{"bound_address":"inet[/
192.168.8.230:9300]","publish_address":"inet[/192.168.8.230:9300]"}}}}

My guess is that its a few hours after restart. That is why I somewhat
wanted to know what it was doing.

@igor, I had to reboot the box but I will paste that if it happens
again. Anyway to increate the -Xmx variable for the java heap size?

Okay, just restarted it. Its at about 105 percent CPU. Here is the
stats:

{
"cluster_name" : "xx",
"nodes" : {
"Mk1V3glvRxinJr_bEZ8DDQ" : {
"name" : "Shadowcat",
"indices" : {
"size" : "60gb",
"size_in_bytes" : 64434244345,
"docs" : {
"num_docs" : 29663747
},
"cache" : {
"field_evictions" : 0,
"field_size" : "0b",
"field_size_in_bytes" : 0,
"filter_count" : 0,
"filter_evictions" : 0,
"filter_mem_evictions" : 0,
"filter_size" : "0b",
"filter_size_in_bytes" : 0
},
"merges" : {
"current" : 1,
"total" : 2,
"total_time" : "35.7s",
"total_time_in_millis" : 35783
}
},
"os" : {
"timestamp" : 1303836311658,
"uptime" : "8 hours, 40 minutes and 15 seconds",
"uptime_in_millis" : 31215000,
"load_average" : [ 13.65, 70.19, 94.85 ],
"cpu" : {
"sys" : 7,
"user" : 37,
"idle" : 47
},
"mem" : {
"free" : "6.5gb",
"free_in_bytes" : 7065710592,
"used" : "8.3gb",
"used_in_bytes" : 9004535808,
"free_percent" : 90,
"used_percent" : 9,
"actual_free" : "13.5gb",
"actual_free_in_bytes" : 14502236160,
"actual_used" : "1.4gb",
"actual_used_in_bytes" : 1568010240
},
"swap" : {
"used" : "0b",
"used_in_bytes" : 0,
"free" : "37.6gb",
"free_in_bytes" : 40470831104
}
},
"process" : {
"timestamp" : 1303836311658,
"cpu" : {
"percent" : 183,
"sys" : "31 seconds and 700 milliseconds",
"sys_in_millis" : 31700,
"user" : "3 minutes and 600 milliseconds",
"user_in_millis" : 180600,
"total" : "-1 milliseconds",
"total_in_millis" : -1
},
"mem" : {
"resident" : "980.3mb",
"resident_in_bytes" : 1027993600,
"share" : "10.7mb",
"share_in_bytes" : 11259904,
"total_virtual" : "1.5gb",
"total_virtual_in_bytes" : 1668825088
},
"fd" : {
"total" : 651
}
},
"jvm" : {
"timestamp" : 1303836311660,
"uptime" : "1 minute, 56 seconds and 642 milliseconds",
"uptime_in_millis" : 116642,
"mem" : {
"heap_used" : "718.7mb",
"heap_used_in_bytes" : 753624584,
"heap_committed" : "971.1mb",
"heap_committed_in_bytes" : 1018298368,
"non_heap_used" : "33.8mb",
"non_heap_used_in_bytes" : 35487376,
"non_heap_committed" : "54.7mb",
"non_heap_committed_in_bytes" : 57389056
},
"threads" : {
"count" : 50,
"peak_count" : 82
},
"gc" : {
"collection_count" : 763,
"collection_time" : "6 seconds and 401 milliseconds",
"collection_time_in_millis" : 6401,
"collectors" : {
"ParNew" : {
"collection_count" : 757,
"collection_time" : "6 seconds and 319 milliseconds",
"collection_time_in_millis" : 6319
},
"ConcurrentMarkSweep" : {
"collection_count" : 6,
"collection_time" : "82 milliseconds",
"collection_time_in_millis" : 82
}
}
}
},
"network" : {
"tcp" : {
"active_opens" : 37,
"passive_opens" : 2678,
"curr_estab" : 36,
"in_segs" : 574496,
"out_segs" : 186058,
"retrans_segs" : 20,
"estab_resets" : 414,
"attempt_fails" : 0,
"in_errs" : 0,
"out_rsts" : 3799
}
},
"transport" : {
"rx_count" : 2,
"rx_size" : "278b",
"rx_size_in_bytes" : 278,
"tx_count" : 2,
"tx_size" : "26b",
"tx_size_in_bytes" : 26
}
}
}
}

If it's linux box, you can run

ps -aef | grep elasticsearch

On Tue, Apr 26, 2011 at 12:46 PM, electic electic@gmail.com wrote:

Okay, just restarted it. Its at about 105 percent CPU. Here is the
stats:

{
"cluster_name" : "xx",
"nodes" : {
"Mk1V3glvRxinJr_bEZ8DDQ" : {
"name" : "Shadowcat",
"indices" : {
"size" : "60gb",
"size_in_bytes" : 64434244345,
"docs" : {
"num_docs" : 29663747
},
"cache" : {
"field_evictions" : 0,
"field_size" : "0b",
"field_size_in_bytes" : 0,
"filter_count" : 0,
"filter_evictions" : 0,
"filter_mem_evictions" : 0,
"filter_size" : "0b",
"filter_size_in_bytes" : 0
},
"merges" : {
"current" : 1,
"total" : 2,
"total_time" : "35.7s",
"total_time_in_millis" : 35783
}
},
"os" : {
"timestamp" : 1303836311658,
"uptime" : "8 hours, 40 minutes and 15 seconds",
"uptime_in_millis" : 31215000,
"load_average" : [ 13.65, 70.19, 94.85 ],
"cpu" : {
"sys" : 7,
"user" : 37,
"idle" : 47
},
"mem" : {
"free" : "6.5gb",
"free_in_bytes" : 7065710592,
"used" : "8.3gb",
"used_in_bytes" : 9004535808,
"free_percent" : 90,
"used_percent" : 9,
"actual_free" : "13.5gb",
"actual_free_in_bytes" : 14502236160,
"actual_used" : "1.4gb",
"actual_used_in_bytes" : 1568010240
},
"swap" : {
"used" : "0b",
"used_in_bytes" : 0,
"free" : "37.6gb",
"free_in_bytes" : 40470831104
}
},
"process" : {
"timestamp" : 1303836311658,
"cpu" : {
"percent" : 183,
"sys" : "31 seconds and 700 milliseconds",
"sys_in_millis" : 31700,
"user" : "3 minutes and 600 milliseconds",
"user_in_millis" : 180600,
"total" : "-1 milliseconds",
"total_in_millis" : -1
},
"mem" : {
"resident" : "980.3mb",
"resident_in_bytes" : 1027993600,
"share" : "10.7mb",
"share_in_bytes" : 11259904,
"total_virtual" : "1.5gb",
"total_virtual_in_bytes" : 1668825088
},
"fd" : {
"total" : 651
}
},
"jvm" : {
"timestamp" : 1303836311660,
"uptime" : "1 minute, 56 seconds and 642 milliseconds",
"uptime_in_millis" : 116642,
"mem" : {
"heap_used" : "718.7mb",
"heap_used_in_bytes" : 753624584,
"heap_committed" : "971.1mb",
"heap_committed_in_bytes" : 1018298368,
"non_heap_used" : "33.8mb",
"non_heap_used_in_bytes" : 35487376,
"non_heap_committed" : "54.7mb",
"non_heap_committed_in_bytes" : 57389056
},
"threads" : {
"count" : 50,
"peak_count" : 82
},
"gc" : {
"collection_count" : 763,
"collection_time" : "6 seconds and 401 milliseconds",
"collection_time_in_millis" : 6401,
"collectors" : {
"ParNew" : {
"collection_count" : 757,
"collection_time" : "6 seconds and 319 milliseconds",
"collection_time_in_millis" : 6319
},
"ConcurrentMarkSweep" : {
"collection_count" : 6,
"collection_time" : "82 milliseconds",
"collection_time_in_millis" : 82
}
}
}
},
"network" : {
"tcp" : {
"active_opens" : 37,
"passive_opens" : 2678,
"curr_estab" : 36,
"in_segs" : 574496,
"out_segs" : 186058,
"retrans_segs" : 20,
"estab_resets" : 414,
"attempt_fails" : 0,
"in_errs" : 0,
"out_rsts" : 3799
}
},
"transport" : {
"rx_count" : 2,
"rx_size" : "278b",
"rx_size_in_bytes" : 278,
"tx_count" : 2,
"tx_size" : "26b",
"tx_size_in_bytes" : 26
}
}
}
}

root 4671 1 64 09:43 pts/0 00:04:32 /usr/bin/java -Xms256m
-Xmx1g -Xss128k -Djline.enabled=true -XX:+UseParNewGC -XX:
+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled -XX:SurvivorRatio=8 -
XX:MaxTenuringThreshold=1 -XX:CMSInitiatingOccupancyFraction=75 -XX:
+UseCMSInitiatingOccupancyOnly -XX:+HeapDumpOnOutOfMemoryError -
Delasticsearch -Des.path.home=/usr/local/elasticsearch -Des-pidfile=/
var/run/elasticsearch.pid -cp :/usr/local/elasticsearch/lib/
elasticsearch-0.16.0.jar:/usr/local/elasticsearch/lib/:/usr/local/
elasticsearch/lib/sigar/
-Des.config=/etc/elasticsearch/
elasticsearch.yml -Des.path.home=/usr/local/elasticsearch -
Des.path.logs=/var/log/elasticsearch -Des.path.data=/var/lib/
elasticsearch -Des.path.work=/tmp/elasticsearch
org.elasticsearch.bootstrap.ElasticSearch
electic 4824 4511 0 09:50 pts/0 00:00:00 grep --color=auto
elasticsearch

The CPU died down. Not sure why last time it stayed at 300 percent.
The only remaining issue is its not discovery it's nodes for some odd
reason. Both boxes are fine and running, not sure after upgrade why I
can't get them to talk.

There are several ways to increase -Xmx size. You find them in comments at
the beginning of bin/elasticsearch file. For example, you can set ES_MAX_MEM
environment variable or specify it in ES_JAVA_OPTS.

One thing that might explain it is the change to allow for 4 concurrent primaries recoveries on a node, which might strain the node too much with the current memory allocation. It used to be 2. You can control that by setting node_initial_primaries_recoveries to 2.

If your node fails when trying to recover 4 shards concurrently, in any case I would say that its in a bad place memory wise, so, I would suggest increasing the memory to something like 2gb MIN and MAX.
On Tuesday, April 26, 2011 at 7:54 PM, Igor Motov wrote:

There are several ways to increase -Xmx size. You find them in comments at the beginning of bin/elasticsearch file. For example, you can set ES_MAX_MEM environment variable or specify it in ES_JAVA_OPTS.