Cluster setup questions and rollover logstash ingestion to elasticsearch


(EatDataForBreakfast) #1

Hi,

I am not sure if its possible to do this, but I am setting up a cluster setup and I wanted to make sure I do everything right. I have 3 nodes, all of them have same configuration (392gb ram, 32 core, 3TB SSD each).

I intend to configure it as follows:

Cluster A : Node A - node.master : True, node.data : False
Cluster A : Node B - node.master : True, node.data : True
Cluster A : Node C - node.master : false, node.data : True

My source of ingestion is logstash. Logstash will ingest the data into ClusterA with input as csv and output to elasticsearch.

I had a few questions for this setup:

a) Is the cluster config OK? i.e two master and two data nodes.
b) What is the best way to run logstash?
My options are:
- Run Logstash on NodeA, and ingest data pointing output to the network IP of Node B?
- Can I run Logstash on NodeA and ingest the csv to elasticsearch running on Node A although nodeA is not a data node? (i,e will the nodeA distribute data to other nodes?)
- for efficiency, will it make a difference if I run logstash on an alltogether different host so that elastic indexing can do its best without hinderance from logstash? -- But in that case, logstash will point over network than localhost.

c) Which node is the best place to run kibana on?

d) Is it possible to have a rollover indexing in elastic? i.e I only want to ingest last 2 weeks of data. I want to remove the oldest two weeks and have a moving window of the data available. That way, I dont run up on disc space?

Is this something relevant ?
https://www.elastic.co/guide/en/elasticsearch/reference/master/indices-rollover-index.html


(Alexander Reelsen) #2

That's a ton of different questions.

First, you may want to have dedicated master nodes (own processes/instances), so that nodes doing the indexing work do not need to deal with master node tasks and vice versa. Also do not use dedicated master nodes to send indexing data two, always send to the master nodes (note: clients can handle this automatically if sniffing is enabled).

You can run logstash and elasticsearch on the same nodes, but this also implies they will potentially steal each other resources, meaning that performance issues will be hard to debug.

The elasticsearch rollover API is used to create a new index, but it does not delete old ones. You should use something like curator to do those cleanup tasks.

--Alex


(EatDataForBreakfast) #3

Thankyou @spinscale.

First, you may want to have dedicated master nodes (own processes/instances), so that nodes doing the indexing work do not need to deal with master node tasks and vice versa.

Yes, all three are running on independent hosts, with its own process/instance.

Also do not use dedicated master nodes to send indexing data two, always send to the master nodes (note: clients can handle this automatically if sniffing is enabled).

Sorry I did not understand this (pardon me am a newbie to elastic). Does this mean that I should setup logstash to point to the master node only, and then master node will take care of the rest?

You can run logstash and elasticsearch on the same nodes, but this also implies they will potentially steal each other resources, meaning that performance issues will be hard to debug.

Thanks for the suggestion, totally makes sense.

The elasticsearch rollover API is used to create a new index, but it does not delete old ones. You should use something like curator1 to do those cleanup tasks.

How does kibana adapt to the changing indices. Say I use rollover API and curator to clean up old one, and all my dashboard/visualization are linked to indexA, is there a way to automatically link kibana dashboards to the rolled-over indexA_1 ?


(Alexander Reelsen) #4

you should configure your ingest nodes to not send data to the master nodes, but to data nodes only.

Kibana allows you to configure an index pattern, thats why you do not need to worry about that, the pattern should cover the indices created by the rollover API.


(EatDataForBreakfast) #5

you should configure your ingest nodes to not send data to the master nodes, but to data nodes only.

Ok, so if I have two data nodes and one logstash ingest node, do I just need to point it to one of them?
What is the role of the master here? Only for indexing?

Sorry for too many questions :slight_smile:

Kibana allows you to configure an index pattern, thats why you do not need to worry about that, the pattern should cover the indices created by the rollover API.

Ah, got it! Awesome.


(EatDataForBreakfast) #6

@spinscale, @warkolm so after a lot of reading and your suggestions, I have come up with this topology, and would like your opinion on this:

I have one logstash node, one dedicated master, one client node and three data nodes out of which two are also master-eligible.

When I configure logstash output, which data node should I point to? should it be the one data-only node?


(EatDataForBreakfast) #7

Will try answering my own question. would be great if anyone can confirm it.

I point the logstash to all the data nodes (output --> hosts array)

Can curator/rollover be done dynamically after my data is ingested? Or are there any limitations there?


(Alexander Reelsen) #8

you cannot control which node is becoming master (the dedicated vs. the data nodes), so I dont think that this setup makes a lot of sense.

The logstash output should specify only data/client nodes, see https://www.elastic.co/guide/en/logstash/5.5/plugins-outputs-elasticsearch.html#plugins-outputs-elasticsearch-hosts


(system) #9

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.