I encountered lost data as a result of java.lang.OutOfMemoryError
condition. I am using:
- 0.17.4
- 4g min/max heap size
- mlockall
- single node, 1 shard per index, 0 replicas
- indexed 300m documents across 6 indexes
config:
index:
number_of_shards: 1
number_of_replicas: 0
refresh_interval: 20sboostrap:
mlockall: true
I have been indexing about 3000 messages per second constantly for
over 80-hours, then at the 300m document level, I ran this query and
the query hung waiting for a response:
curl -XGET http://localhost:9200/_search -d '{"query":
{"query_string":{"query":"someword"}}}'
Where someword was contained in all 300m documents. Previous to this,
other queries returned results fine. Queries to the health endpoint
and any other query did not return a result (just "hangs"). The only
evidence of a problem other than the queries hanging was this in the
log file:
failed engine java.lang.OutOfMemoryError: Java heap space
A normal "kill PID" command did not initiate a graceful shudown and I
had to perform a "kill -9".
"lsof" showed 26k files open (vs. 1300 without ES running), and ulimit
is set at 100000.
Upon restart, with updated min/max heap size to 8g, all of the files
in the ./index and ./translog directories were deleted. The cluster
won't start up and shows a constant stream of errors like this and
this is continuous without stopping:
[2011-08-11 01:07:58,378][WARN ][indices.cluster ] [Unuscione, Angelo] [index0006][0] failed to start shard
org.elasticsearch.index.gateway.IndexShardGatewayRecoveryException: [index0006][0] shard allocated for local recovery (post api), should exists, but doesn't
Is there a way to predict when we might encounter out-of-memory
errors? I can enforce better queries with the user interface if there
are some queries that we should not be executing.
Is there a better way to startup ES if I know I previously had an
unclean shutdown?
I am expecting each node to have at least 10 billion documents across
100 to 300 indexes eventually. The document sizes are small, ranging
from 200 to 800 bytes, plus an additional 200 bytes for the fields.