Logstash target on Elasticsearch Architecture


#1

Hi,

I have the following architecture to elasticsearch

3 dedicated master eligible nodes (8 GB RAM + 2 CPUs)
3 data/ingest node (64GB RAM + 8 CPUs)
1 coordinating node (24 GB RAM + 4 CPUs)

Which the better approach? Configure logstash to send outputs to coordinating node or change data node to be data/master node and send logstash outputs to data/master nodes?

The coordinating node will be receive Kibana requests.


(Christian Dahlqvist) #2

Another option would be to keep the dedicated master nodes and configure Logstash to send data directly to the data/ingest nodes. If dedicated master nodes are not required and you change the layout, you could send data directly to the master/data/ingest nodes.


#3

Thanks Christian

I think I can mantain the following architecture:

3 master deticated (Cluster management)
3 data/ingest (logstash send directly)
1 coordinating only (clients consumer)

With this architecture I think that avoid a currently problem for duplication data when new index are create.

Do you have any tip to avoid duplication data on index creation?

Thanks


(Christian Dahlqvist) #4

I am not sure I understand what you are referring to. Can you please clarify?


#5

Sure!!

I have a duplication data always that new index are created. My index are generated daily.
I need to avoid duplication data on index creation. I think that my currently architecture do not support the the currently load of data and the bulk API receive several timeout. Below the error identified.

[2018-06-17T20:00:47,039][DEBUG][o.e.a.a.i.m.p.TransportPutMappingAction] [cdv1prgrafapv03-node-1] failed to put mappings on indices [[[emm_001-2018.06.18/y7YonDWiTo2AeED-mYxKFA]]], type [doc]
org.elasticsearch.cluster.metadata.ProcessClusterEventTimeoutException: failed to process cluster event (put-mapping) within 30s
at org.elasticsearch.cluster.service.MasterService$Batcher.lambda$null$0(MasterService.java:122) ~[elasticsearch-6.1.0.jar:6.1.0]
at java.util.ArrayList.forEach(ArrayList.java:1249) ~[?:1.8.0_65]
at org.elasticsearch.cluster.service.MasterService$Batcher.lambda$onTimeout$1(MasterService.java:121) ~[elasticsearch-6.1.0.jar:6.1.0]
at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:568) [elasticsearch-6.1.0.jar:6.1.0]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [?:1.8.0_65]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [?:1.8.0_65]
at java.lang.Thread.run(Thread.java:745) [?:1.8.0_65]


(Christian Dahlqvist) #6

It seems like cluster updates are taking a really long time, which is causing problems. How many indices and shards do you have in your cluster?


#7

I have 24.572 shards and 2.458 indeces


(Christian Dahlqvist) #8

That is far too many indices and shards for a cluster that size. Please read this blog post on shards and sharding for some practical guidelines and then change you you shard your data so you reduce this number dramatically.


(system) #9

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.