Multiple outputs with different formats

Greetings

I am using Logstash to parse and index event logs into Elasticsearch. My pipeline has the following form:

input{
  # input plugins and their config
}

filter{
  # lots of parsing
  # in some cases, event is tagged as long_term
}

output{
  # index_A in elasticsearch with short-term retention policy
}

Now, I would like all events tagged with long_term to be ALSO indexed into an index_B. This can be done like so:

output{
  # index_A in elasticsearch
  if "long_term" in [tags] {
    # index_B in elasticsearch, with long-term retention policy
  }
}

However, my index_B requires additional processing - namely replacing email with a fingerprint, and keeping only small subset of the fields.

Can I have two output clauses with a filter clause in between, with the execution proceeding sequentially?

I was thinking I could have a different template for index_b, but how do I make sure that only fields specified in the template are stored?

If you just want to ignore some fields in index B you could define the fields you want in the template and disable dynamic mapping.

If you need to do data transformations while ingesting in elasticsearch you may be able to do it using a script processor. I believe that supports replacing a field with a fingerprint or hash, but that is an elasticsearch question, not a logstash question.

If you want to have logstash remove fields and replace fields with hashes I would look at pipeline to pipeline comms with a forked path pattern.

I decided to disable the dynamic mapping, but while the fields such as email are not indexed, they are still stored in the _source field. To address this, I had to include this in my template:

"_source": {
  "includes": [
    # List of all fields that are to be stored
  ]
}
1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.