We have a 10 node elasticsearch cluster which is receieving roughly 10k/s
worth of logs lines from our application.
Each elasticsearch node has 132gb of memory - 48gb heap size, the disk
subsystem is not great, but it seems to be keeping up. (This could be an
issue, but i'm not sure that it is)
We have a 10 node elasticsearch cluster which is receieving roughly 10k/s
worth of logs lines from our application.
Each elasticsearch node has 132gb of memory - 48gb heap size, the disk
subsystem is not great, but it seems to be keeping up. (This could be an
issue, but i'm not sure that it is)
Hi,
The reason this is set is because without it we reject messages and there
fore don't have all the log entries.
I'm happy to be told this isn't required, but i'm pretty sure it is. We are
constantly bulk indexing large numbers of events.
On Wednesday, August 13, 2014 6:09:46 PM UTC+2, Jörg Prante wrote:
Because you set queue_size: -1 in the bulk thread pool, you explicitly
allowed the node to crash.
You should use reasonable resource limits. Default settings, which are
reasonable, are sufficient in most cases.
Jörg
On Wed, Aug 13, 2014 at 5:18 PM, Robert Gardam <robert...@fyber.com
<javascript:>> wrote:
Hello
We have a 10 node elasticsearch cluster which is receieving roughly 10k/s
worth of logs lines from our application.
Each elasticsearch node has 132gb of memory - 48gb heap size, the disk
subsystem is not great, but it seems to be keeping up. (This could be an
issue, but i'm not sure that it is)
If Elasticsearch rejects bulk actions, this is serious and you should
examine the cluster to find out why this is so. From slow disks, cluster
health, or capacity problems, everything comes to mind. But if you ignore
problem solution and merely disable bulk resource control instead, you open
the gate wide to unpredictable node crashes, and you won't be able to
control the cluster at a certain point.
To reduce the number of active bulk requests per timeframe, for example,
you could increase the bulk request actions per request. Or simply increase
the number of nodes. Or think about the shard/replica organization while
indexing - it can be an advantage to bulk index to replica level 0 index
only and increase the replica level later.
We have a 10 node elasticsearch cluster which is receieving roughly
10k/s worth of logs lines from our application.
Each elasticsearch node has 132gb of memory - 48gb heap size, the disk
subsystem is not great, but it seems to be keeping up. (This could be an
issue, but i'm not sure that it is)
We have not had this cluster stay up for more than a week, but it also
seems to crash for no real reason.
It seems like one node starts having issues and then it takes the entire
cluster down.
Does anyone from the community have any experience with this kind of
setup?
Thanks in Advance,
Rob
--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
I appreciate your answers. I think IO could be a contributing factor. I'm
thinking of splitting the index into an hourly index with no replicas for
bulk importing and then switch it on afterwards.
I think the risk of loosing data would be too high if it was any longer
than that. Also Does the async replication from the logstash side of things
cause unknown issues?
On Wednesday, August 13, 2014 7:08:05 PM UTC+2, Jörg Prante wrote:
If Elasticsearch rejects bulk actions, this is serious and you should
examine the cluster to find out why this is so. From slow disks, cluster
health, or capacity problems, everything comes to mind. But if you ignore
problem solution and merely disable bulk resource control instead, you open
the gate wide to unpredictable node crashes, and you won't be able to
control the cluster at a certain point.
To reduce the number of active bulk requests per timeframe, for example,
you could increase the bulk request actions per request. Or simply increase
the number of nodes. Or think about the shard/replica organization while
indexing - it can be an advantage to bulk index to replica level 0 index
only and increase the replica level later.
Jörg
On Wed, Aug 13, 2014 at 6:50 PM, Robert Gardam <robert...@fyber.com
<javascript:>> wrote:
Hi,
The reason this is set is because without it we reject messages and there
fore don't have all the log entries.
I'm happy to be told this isn't required, but i'm pretty sure it is. We
are constantly bulk indexing large numbers of events.
On Wednesday, August 13, 2014 6:09:46 PM UTC+2, Jörg Prante wrote:
Because you set queue_size: -1 in the bulk thread pool, you explicitly
allowed the node to crash.
You should use reasonable resource limits. Default settings, which are
reasonable, are sufficient in most cases.
We have a 10 node elasticsearch cluster which is receieving roughly
10k/s worth of logs lines from our application.
Each elasticsearch node has 132gb of memory - 48gb heap size, the disk
subsystem is not great, but it seems to be keeping up. (This could be an
issue, but i'm not sure that it is)
We have not had this cluster stay up for more than a week, but it also
seems to crash for no real reason.
It seems like one node starts having issues and then it takes the
entire cluster down.
Does anyone from the community have any experience with this kind of
setup?
Thanks in Advance,
Rob
--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
Or maybe it's worth rethinking the architecture to avoid having to do
tricks like no-replicas for 1h. Kafka in front of ES comes to mind. We
use this setup for Logsene http://sematext.com/logsene/ and don't have
the problem with log loss, so it may work well for you, too.
I think you could also replace Redis + 3 Logstash servers with 1 rsyslog
server with omelasticsearch, which has built-in buffering in memory and on
disk (see links below for config examples).
On Wednesday, August 13, 2014 7:24:09 PM UTC+2, Robert Gardam wrote:
I appreciate your answers. I think IO could be a contributing factor. I'm
thinking of splitting the index into an hourly index with no replicas for
bulk importing and then switch it on afterwards.
I think the risk of loosing data would be too high if it was any longer
than that. Also Does the async replication from the logstash side of things
cause unknown issues?
On Wednesday, August 13, 2014 7:08:05 PM UTC+2, Jörg Prante wrote:
If Elasticsearch rejects bulk actions, this is serious and you should
examine the cluster to find out why this is so. From slow disks, cluster
health, or capacity problems, everything comes to mind. But if you ignore
problem solution and merely disable bulk resource control instead, you open
the gate wide to unpredictable node crashes, and you won't be able to
control the cluster at a certain point.
To reduce the number of active bulk requests per timeframe, for example,
you could increase the bulk request actions per request. Or simply increase
the number of nodes. Or think about the shard/replica organization while
indexing - it can be an advantage to bulk index to replica level 0 index
only and increase the replica level later.
We have a 10 node elasticsearch cluster which is receieving roughly
10k/s worth of logs lines from our application.
Each elasticsearch node has 132gb of memory - 48gb heap size, the disk
subsystem is not great, but it seems to be keeping up. (This could be an
issue, but i'm not sure that it is)
We have not had this cluster stay up for more than a week, but it also
seems to crash for no real reason.
It seems like one node starts having issues and then it takes the
entire cluster down.
Does anyone from the community have any experience with this kind of
setup?
Thanks in Advance,
Rob
--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.