I am not sure if its possible to do this, but I am setting up a cluster setup and I wanted to make sure I do everything right. I have 3 nodes, all of them have same configuration (392gb ram, 32 core, 3TB SSD each).
I intend to configure it as follows:
Cluster A : Node A - node.master : True, node.data : False
Cluster A : Node B - node.master : True, node.data : True
Cluster A : Node C - node.master : false, node.data : True
My source of ingestion is logstash. Logstash will ingest the data into ClusterA with input as csv and output to elasticsearch.
I had a few questions for this setup:
a) Is the cluster config OK? i.e two master and two data nodes.
b) What is the best way to run logstash?
My options are:
- Run Logstash on NodeA, and ingest data pointing output to the network IP of Node B?
- Can I run Logstash on NodeA and ingest the csv to elasticsearch running on Node A although nodeA is not a data node? (i,e will the nodeA distribute data to other nodes?)
- for efficiency, will it make a difference if I run logstash on an alltogether different host so that elastic indexing can do its best without hinderance from logstash? -- But in that case, logstash will point over network than localhost.
c) Which node is the best place to run kibana on?
d) Is it possible to have a rollover indexing in elastic? i.e I only want to ingest last 2 weeks of data. I want to remove the oldest two weeks and have a moving window of the data available. That way, I dont run up on disc space?
Is this something relevant ?