Is indexing done in Logstash or Elasticsearch?

Hi All,

I am understanding the process of indexing. I have read this question and answer:
https://stackoverflow.com/questions/38248757/which-elasticsearch-node-is-better-configured-in-logstash-elasticsearch-output-p
When I use Logstash, I define the index for each document in the output.
But why should we specify the data node of Elasticsearch cluster for the output in Logstash but not specify the master node which is specialized for indexing? Or is that the indexing is already done in Logstash so we can directly send them to data node?
Should the flow be like this?:
Logstash filtering and indexing -> ES data node, or
Logstash filtering -> ES master node indexing -> ES data node for storage

Thank you in advance

Elasticsearch master nodes are not involved in the indexing flow. Indexing is all handled by the data nodes.

I see, I also read https://www.elastic.co/guide/en/elasticsearch/reference/current/modules-node.html
maybe I misunderstand indexing and create index,
so what would be the meaning of create index in master node when indexing is already done in data node?
Also, if Logstash output directly to the data node, the documents are already stored in shards, so how master node allocate the shards to nodes?
Moreover, why is it so important to maintain master node stable when the cluster's information is already shared among nodes in the cluster, I don't understand the importance of the master node.

When a new index is created this is done by the master node which distributes the shards across the nodes in the cluster. The data nodes receive bulk requests from Logstash and indexes this data into the shards they hold. The master node manages changes to the cluster state, e.g. moving shards in case of failure, but does not get involved in the processing of bulk requests or queries. It also monitors the state of the nodes in the cluster so failure can be detencted as quickly as possible, which is why it should not get overloaded.

I partly understand your point.
So what if the index is not found in data node, will the process be like this?
data node receive request from logstash and cannot find the index -> redirect the request to master node for creating index -> once index is created it forward back the request to data node for indexing
Another question: is cluster state monitoring or moving shards in case failure very resource draining? Otherwise, can I just dedicate a master node and allocate just a relatively little resource to it and allocate lot more resource to data nodes for handling bulk request?

Thanks

If an index does not exist (or a new field need to be mapped) the master node will be notified and handle this. The indexing request can then continue processing once this has been done. This is a simplification, but explains roughly when the master node gets involved.

No. If you have dedicated master nodes and allow these to manage the cluster, e.g. by not serving requests, they can often be a lot smaller than data nodes. 2 CPU cores and 4GB RAM is common, but they can be even smaller.

Yes, but you should always look to have at least 3 master eligible nodes as Elasticsearch uses consensus based algorithms for master election and require a strict majority of master eligible nodes to be available.

Allow me to ask a further question for master eligible node
I have read https://www.elastic.co/guide/en/elasticsearch/reference/current/modules-node.html#master-node

Am I correct that since master eligible voting only nodes can not actually be the master node, so is it useless except it can vote? Should I enable node.data for this node and overload it?

Also, is that every time like creating/delete index or CRUD, the election will perform once?

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.