Hi,
I have the following architecture to elasticsearch
3 dedicated master eligible nodes (8 GB RAM + 2 CPUs)
3 data/ingest node (64GB RAM + 8 CPUs)
1 coordinating node (24 GB RAM + 4 CPUs)
Which the better approach? Configure logstash to send outputs to coordinating node or change data node to be data/master node and send logstash outputs to data/master nodes?
The coordinating node will be receive Kibana requests.
Another option would be to keep the dedicated master nodes and configure Logstash to send data directly to the data/ingest nodes. If dedicated master nodes are not required and you change the layout, you could send data directly to the master/data/ingest nodes.
Thanks Christian
I think I can mantain the following architecture:
3 master deticated (Cluster management)
3 data/ingest (logstash send directly)
1 coordinating only (clients consumer)
With this architecture I think that avoid a currently problem for duplication data when new index are create.
Do you have any tip to avoid duplication data on index creation?
Thanks
I am not sure I understand what you are referring to. Can you please clarify?
Sure!!
I have a duplication data always that new index are created. My index are generated daily.
I need to avoid duplication data on index creation. I think that my currently architecture do not support the the currently load of data and the bulk API receive several timeout. Below the error identified.
[2018-06-17T20:00:47,039][DEBUG][o.e.a.a.i.m.p.TransportPutMappingAction] [cdv1prgrafapv03-node-1] failed to put mappings on indices [[[emm_001-2018.06.18/y7YonDWiTo2AeED-mYxKFA]]], type [doc]
org.elasticsearch.cluster.metadata.ProcessClusterEventTimeoutException: failed to process cluster event (put-mapping) within 30s
at org.elasticsearch.cluster.service.MasterService$Batcher.lambda$null$0(MasterService.java:122) ~[elasticsearch-6.1.0.jar:6.1.0]
at java.util.ArrayList.forEach(ArrayList.java:1249) ~[?:1.8.0_65]
at org.elasticsearch.cluster.service.MasterService$Batcher.lambda$onTimeout$1(MasterService.java:121) ~[elasticsearch-6.1.0.jar:6.1.0]
at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:568) [elasticsearch-6.1.0.jar:6.1.0]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [?:1.8.0_65]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [?:1.8.0_65]
at java.lang.Thread.run(Thread.java:745) [?:1.8.0_65]
It seems like cluster updates are taking a really long time, which is causing problems. How many indices and shards do you have in your cluster?
I have 24.572 shards and 2.458 indeces
That is far too many indices and shards for a cluster that size. Please read this blog post on shards and sharding for some practical guidelines and then change you you shard your data so you reduce this number dramatically.