Wanted to check an understanding on multiple pipelines:
My understanding of having multiple pipelines is it allows you to have different inputs and outputs for a specific filter and provides better performance (summarized).
I came across this when I had different input / filters and outputs.
Using an example:
I have a filebeat that sends multiple different logs from different sources to logstash. On the filebeat thread I had a thread where it was not recommended to use different ports with filebeat. So that limits me to port 5044.
On my main pipeline input I use host=> 0.0.0.0 and port => 5044 - all is good!
On my someother pipeline can I still use host => 0.0.0.0 and port => 5044??
When I was doing some testing I kept on getting a pipeline error that 0.0.0.0 and port 5044 is already in use and that pipeline use to cause a pipeline plugin to fail and try restart.
The other options I have is the server has multiple NIC's I could use a different IP address and port 5044 and just add the additional host to the host section of the filebeat?
Thank you for the responses: @Christian_Dahlqvist so if I look at the documentation it says:
- pipeline.id: upstream
config.string: input { stdin {} } output { pipeline { send_to => [myVirtualAddress] } }
- pipeline.id: downstream
config.string: input { pipeline { address => myVirtualAddress } }
The way I understand that if I may ask:
this uses two separate pipelines:
first pipeline listens on 0.0.0.0 port 5044 for incoming events so that is filebeat sending all the different logs
second pipeline "downstream" - that's what I don't understand.
So does the downstream receive all the filtered logs from filebeat on a single address i.e. 0.0.0.0 5044 it then passes these events to the downstream pipeline, the downstream pipeline is where you do your filtering for the output to elasticsearch?
Is that thinking correct or am I still smoking my socks
Just trying to make sense of this before I mangle my pipeline if you don't mind.
So the "myVirtualAddress" is the defined name of the downstream pipeline created ?
- pipeline.id: beats-server
config.string: |
I then see it uses config.string (as per previous post with the virtual address), can one still use the path.config pointing to a filter.conf file?
So my thinking if I am right:
- pipeline.id: beats-server
config.string: |
input { beats { port => 5044 } } **--- So it has one input on the 5044 port**
output { **----- then outputs to defined downstream pipelines**
if [type] == log1 {
pipeline { send_to => log1 }
} else if [type] == log2 {
pipeline { send_to => log2}
} else {
pipeline { send_to => unknownEvent}
}
}
- pipeline.id: log1-processing
config.string: |
input { pipeline { address => log1} }
**===>>>** Here is the difference **--** instead of having the filter here use: path.config:
<Path_To_Filter.conf> **--- then here use a .conf file instead of having the entire text in the **
** pipeline.yml**
output {
elasticsearch { hosts => [es_cluster_a_host] }
}
In the first pipeline, output to stdout to verify that the fields exist and that the conditionals work. You can always also add an else output to capture anything that does not match. I also do not think you can mix config strings and paths, so would recommend placing all the configs in separate files instead of using the config string option.
Do what you did above, but put each pipeline logic in a file and reference this. Add a separate output to the pipeline with the input for troubleshooting so you know whether the data looks like you expect it to or not.
You can not have a path.config within the config.string parameter.
Can I still use that as configured in a different pipeline file or will it pick up that its using "path.config"
Two other questions of understanding:
If one cannot reference a config file for filter, does that mean one is forced to put all the filter logic in the pipeline config file - is my understanding correct? Just don't dont want to go and move all that logic etc and there a better way to do it.
I am sorry for the stupid question but when you say
So the way I understand this is usually I would reference a config file for a pipeline using path.config now keeping my first block
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.