we are using elasticsearch to index firewall logs from multiple vendors, so we end with different keywords for the same value/event (es permit and permitted, Deny and denied and many other permutations, etc)
I am now looking into the best way to unify those values across the board to then be able to use them in dashboards and the like.
Synonyms look like the way to go (to me, although I'm open to suggestions)
The catch though is that we would need the values to be keyword and not text, as we actually filter and would like to aggregate them.
I have tried to define a synonym filter in a normalizer, but I get an error saying that the filter is not supported (I suppose because, depending on configuration, it could return multiple values)
Is there any workaround?
I created a custom analyser with a keyword tokenizer and a synonym filter, but is not an option either because it can only be used on text fields..
If at all possible I would prefer to avoid multi fields because of the disk space waste
Am I focusing on synonyms too much? Is there an alternative/better solution?
Other options from my understanding are:
- Replace the values in the fields logstasth before ingestion (seems difficult to maintain, although I may be wrong. we already use logstash for ingestion)
- Create an elastic ingestion pipeline using the set processor (although it doesnt allow any conditional and we have multiple values for same fields, so its probably a no go)
- Create an elastic ingestion pipeline using the script processor (never tried to use painless, not sure how much effort it would require)
Any input is appreciated.