Logstash and multiple pipelines. Why are my pipelines merging?


(Daniel McGreal) #1

Hi stashers,
I have two pipelines, A & B, both configured to process separate CSV files of different formats into separate indexes on the same Elastic Cloud cluster. Something very peculiar (to me...) is happening when they are both running in the same logstash 5.0.0 (can I use higher level versions of LS with ES 5.0.0?).

A
input: /path/to/As/*.csv
filter: lots of csv columns, make some fields lowercase, convert fields to types, prepend "A-" to a field to use for the index name, match date on field
output: ES cluster, set template, index, type and id all from fields.

B
input: /path/to/Bs/*.csv
filter: a few csv columns, prepend "B-" to a field used for the index name.
output: ES cluster, set template, index, type and id all from fields.

When run simultaneously (both configurations in /etc/logstash/conf.d/) several weird behaviours are observed:

  • Sometimes I get a document in the B index with id "%{fieldThatShouldHaveBeenTheId}" literally, as in, every document has been indexed with the same id that hasn't been interpolated correctly. When this happens, some of the structure of the document is correct, but other fields values have come from the wrong column in the CSV and (most oddly) there's even columns in there from A's CSV mapping!
  • Other times my B documents are mapped correctly, but they are joined by a random smattering of documents from A's pipeline.

What is going on here?!


Duplicate records when scaling logstash
(Christian Dahlqvist) #2

When several config files are available, Logstash will merge them into one, and if you want to separate flows you will need to set tags and use conditionals.


(Daniel McGreal) #3

Hi Christian,
Where is this documented?
Thanks!
Dan.


(Christian Dahlqvist) #4

The use of conditionals is documented here. The concatenation of configuration files is described here and here.


(Daniel McGreal) #5

Thanks. I ended up with conditionals in my filter and output blocks applying a regex to the 'path' field. Seems to work, but is this what you imagined? I ask as I'm unclear on exactly how the concatenation works.

Mostly because when I start LS, logging says that two pipelines have been started. I'm confused by what the definition of a pipeline and how it relates to concatenation?

Does the concatenation actually end up parsing the configurations such that I'd end up with:

input {
  A's inputs{}
  B's inputs{}
}
filter {
  A's filters{}
  B's filters{}
}

...?

If so, I understand, otherwise any more information would be appreciated.


(Bevan Bennett) #6

Yes, that is a reasonable description.
The order within input and filter is lexically by conf filename, so if A's conf was in 003-A.conf and B's conf was in 002-B.conf, you would end up with:

input {
  B's inputs{}
  A's inputs{}
}
filter {
  B's inputs{}
  A's inputs{}
}

The usual pattern is to set 'type="A"' in A's inputs and then surround A's filters with 'if [type] == "A" { }'


(system) #7

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.