Hi all,
I have a question about storing and indexing streaming logs. Currently I am
testing this scenario, to find best approach how to organize and configure
my setup.
I have a cluster with four nodes (all in virtual machines):
-
two data nodes (data: true; master: false) with 4 GB RAM
-
one master node (data: false master: true) with 2 GB RAM
-
one web frontend (data: false; master: false) with 2GB RAM - it is used
only to access information from external applications like Kibana
Data is pulled from RabbitMQ via river.
Because indexed data is not a plain text I have created probably more
aggressive indexing like this:
"settings": {
"index": {
"number_of_shards": 20,
"number_of_replicas": 1,
"refresh_interval" : "30s",
"merge.policy.merge_factor": 30,
"store.throttle.max_bytes_per_sec": "5mb",
"store.throttle.type": "merge",
"analysis": {
"filter": {
"mynGram": {
"type": "nGram",
"min_gram": 2,
"max_gram": 50
}
},
"analyzer": {
"a1": {
"type": "custom",
"tokenizer": "whitespace",
"filter": [
"lowercase",
"mynGram"
]
}
}
}
}
},
I am a bit concern about the amount of messages that are able to be indexed
without ES to crash.
If I create index without replicas I am able to store and index about 2000
entries in a second. When I try to store more data or I have replicas I got
errors like this:
[54]: index [delme], type [logentry], id [7zIkj_i8RpOBm65ShrZS_A], message
[RemoteTransportException[[Cardiac][inet[/172.31.80.82:9300]][bulk/shard]];
nested: EsRejectedExecutionException[rejected execution of
[org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction$1]];
]
[2013-10-07 08:07:55,212][WARN ][cluster.action.shard ] [Crimson and
the Raven] received shard failed for [delme][1],
node[RC-iJe40TCqMKtV4SMam2A], [R], s[INITIALIZING], reason [Failed to start
shard, message [RecoveryFailedException[[delme][1]: Recovery failed from
[Alibar][8UmjyySBR7mb_TTfzq2hQg][inet[/172.31.80.83:9300]]{master=false}
into
[Cardiac][RC-iJe40TCqMKtV4SMam2A][inet[/172.31.80.82:9300]]{master=false}];
nested:
RemoteTransportException[[Alibar][inet[/172.31.80.83:9300]][index/shard/recovery/startRecovery]];
nested: RecoveryEngineException[[delme][1] Phase[3] Execution failed];
nested:
SendRequestTransportException[[Cardiac][inet[/172.31.80.82:9300]][index/shard/recovery/translogOps]];
nested: OutOfMemoryError[Java heap space]; ]]
I have few questions:
-
Does anyone have an idea what causes this errors and what is the path to
fix them? -
What amount of documents I have to expect to be able to store and with
this setup? -
Any article or advice where to search information how to create and
configure ES cluster/indexes that will be able to store and index such
amount streaming logs? On which specific config options and architecture
options to pay attention?
I have to be able to evaluate such setup and to see if we can use ES in
this case and any help will be appreciated.
Thanks in advance.
Best regards,
Nickolay Kolev
--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.