It appears that the normal path is to use either logstash OR ingest nodes for inbound traffic.
in these two forms:
beats clients -> ingest -> hot data nodes
or
beats clients -> logstash -> hot data nodes
But is there any major performance issues that would result from pushing:
filebeat clients -> logstash -> ingest -> hot data nodes ?
the thing is that pipeline filtering in the ingest nodes is much easier than logstash and i'd get the team to use that to get things done quickly.
Logstash would be used for the really high-volume streams or extended stuff that ingest cant do.
Neither logstash or dedicate ingest nodes are required, they can be used if needed, but I would say that it is way more common to send data directly to the data nodes.
Dedicated Ingest node is normally used when you have ingest performance issues in your data nodes, if you do not have issues in your data nodes then ingest nodes may not be required.
Logstash is normally used when you have some specific requirements regarding parsing or want to send the data to multiple locations, if you do not have these needs it may also not be required.
I don't think so, but what would logstash be used for here? Are you doing the parse in Logstash or using Ingest pipelines? It is just another piece that can add a few miliseconds to the ingestion time.
Curious, what you find it easier to do in Ingest pipelines? I think this is more a personal choice, for me Logstash is way way more flexible and easier to work than ingest pipelines.
This is one of the uses for Logstash, mostly the extended stuff because it is very flexible, there are some things that only Logstash can do.
But also, it may not be required, only testing would be able to answer that.
The normal path today is to send data directly to the hot data nodes.
basically, it's a big enough cluster that we run logstash.
I dont mind logstash at all but other team members find ingest pipelines much easier. And to be honest the painless scripting in ingest pipelines is quite handy.
Logstash is obviously faster so i would keep it to logstash for the high-lograte pipelines.
what im after is an either/or situation where i can apply ingest-node or logstash as needed.
So i guess you've already answered here that it would add milliseconds, i guess i need to do some personal testing to see what kind of impact this has at high rates..
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.