Hi stashers,
I have two pipelines, A & B, both configured to process separate CSV files of different formats into separate indexes on the same Elastic Cloud cluster. Something very peculiar (to me...) is happening when they are both running in the same logstash 5.0.0 (can I use higher level versions of LS with ES 5.0.0?).
A
input: /path/to/As/*.csv
filter: lots of csv columns, make some fields lowercase, convert fields to types, prepend "A-" to a field to use for the index name, match date on field
output: ES cluster, set template, index, type and id all from fields.
B
input: /path/to/Bs/*.csv
filter: a few csv columns, prepend "B-" to a field used for the index name.
output: ES cluster, set template, index, type and id all from fields.
When run simultaneously (both configurations in /etc/logstash/conf.d/) several weird behaviours are observed:
Sometimes I get a document in the B index with id "%{fieldThatShouldHaveBeenTheId}" literally, as in, every document has been indexed with the same id that hasn't been interpolated correctly. When this happens, some of the structure of the document is correct, but other fields values have come from the wrong column in the CSV and (most oddly) there's even columns in there from A's CSV mapping!
Other times my B documents are mapped correctly, but they are joined by a random smattering of documents from A's pipeline.
When several config files are available, Logstash will merge them into one, and if you want to separate flows you will need to set tags and use conditionals.
Thanks. I ended up with conditionals in my filter and output blocks applying a regex to the 'path' field. Seems to work, but is this what you imagined? I ask as I'm unclear on exactly how the concatenation works.
Mostly because when I start LS, logging says that two pipelines have been started. I'm confused by what the definition of a pipeline and how it relates to concatenation?
Does the concatenation actually end up parsing the configurations such that I'd end up with:
Yes, that is a reasonable description.
The order within input and filter is lexically by conf filename, so if A's conf was in 003-A.conf and B's conf was in 002-B.conf, you would end up with:
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.