Problem with "cloning" (filebeat -> logstash -> elasticsearch) - more logstash pipelines

Hi,

I have a problem.
In simple terms, the problem looks like this:

If I add one pipeline in logstash:

input {
   beats {
      ...
   }
}

filter {
...
}

output {
   elasticsearch {
      ...
   }
}

Everything works very well.

But if I use the configuration that we have one pipeline that distributes data from filebeat to two ports (in localhost):

input {
   beats {
      ...
   }
}

filter {
}

output {
   udp {
      host => "localhost"
      port => X
      codec => "json_lines"
   }
   udp {
      host => "localhost"
      port => Y
      codec => "json_lines"
   }
}

And then another pipeline receives/parse data from the port and adds it to elasticsearch:

input {
       udp {
          port => X
          codec => "json_lines"
       }
}

filter {
...
}

output {
   elasticsearch {
      ...
   }
}

Unfortunately, it's not working.
So to speak quickly... I need to split one beats input into two separate pipeline in logstash.

I've read about pipeline-to-pipeline capabilities and lumbejack input/output.

But is there a possibility here of a simple forward after tcp/udp and such cloning?

We run several parallel Logstash pipelines

We have based our config on this documentation

I find the documentation pretty clear so I will let it speak on its own. I'm happy to try to answer questions if there are any.

Yes, but I must have two pipelines using data from the same network port (beats).
Normally I also have several different pipelines in pipelines.yml.
No problem and it works well but I always have one input to one pipeline.
I need one input to two pipelines for some reasons.

This is not a standard configuration.
However I want to have two different pipelines performing completely different parsing.
It should also be assumed that I don't have the ability to add another output to filebeat and I want a very simple and fast solution on logstash.
I guess we can do this on udp/tcp forwarding?

This is a completely theoretical, not tested, suggestion as I have not used pipeline-to-pipeline communication

I would expect something like this to do the job

# config/pipelines.yml
- pipeline.id: upstream
  config.string: input { beats {} } output { pipeline { send_to => [myVirtualAddress1] } pipeline { send_to => [myVirtualAddress2] } }
- pipeline.id: downstream1
  config.string: input { pipeline { address => myVirtualAddress1 } }
- pipeline.id: downstream2
    config.string: input { pipeline { address => myVirtualAddress2 } } 

I'm sure you would put the config for each pipeline in their own files as the lines would get hard to read otherwise but just to illustrate the main config.

Yes.
I tested it.
But I have very extensive parsers described in the separated files.
One pipeline is one quite long file with input/filter/output (filters can be quite a lot).

Such configuration probably works only if everything is in one config/pipelines.yml file.

But when I was adding to each pipeline a record of type:

input { pipeline { address => myVirtualAddress2 } }

In a separate file, I this not working...

The only thing I can think of off the top of my head is that each input can only be used once (as far as I know). So any specific port or virtual address can only be used once, otherwise I would expect there to be resource conflicts between pipelines. I would expect to see that in Logstash logs though.

Could you show your file and folder structure and the config/pipelines.yml you use.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.