I am using Logstash to parse and index event logs into Elasticsearch. My pipeline has the following form:
input{
# input plugins and their config
}
filter{
# lots of parsing
# in some cases, event is tagged as long_term
}
output{
# index_A in elasticsearch with short-term retention policy
}
Now, I would like all events tagged with long_term to be ALSO indexed into an index_B. This can be done like so:
output{
# index_A in elasticsearch
if "long_term" in [tags] {
# index_B in elasticsearch, with long-term retention policy
}
}
However, my index_B requires additional processing - namely replacing email with a fingerprint, and keeping only small subset of the fields.
Can I have two output clauses with a filter clause in between, with the execution proceeding sequentially?
If you just want to ignore some fields in index B you could define the fields you want in the template and disable dynamic mapping.
If you need to do data transformations while ingesting in elasticsearch you may be able to do it using a script processor. I believe that supports replacing a field with a fingerprint or hash, but that is an elasticsearch question, not a logstash question.
If you want to have logstash remove fields and replace fields with hashes I would look at pipeline to pipeline comms with a forked path pattern.
I decided to disable the dynamic mapping, but while the fields such as email are not indexed, they are still stored in the _source field. To address this, I had to include this in my template:
"_source": {
"includes": [
# List of all fields that are to be stored
]
}
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.