Indexing multiple synonym values as keywords

Emanuele_Verga · May 8, 2017, 10:32pm

Hi all,

we are using elasticsearch to index firewall logs from multiple vendors, so we end with different keywords for the same value/event (es permit and permitted, Deny and denied and many other permutations, etc)

I am now looking into the best way to unify those values across the board to then be able to use them in dashboards and the like.
Synonyms look like the way to go (to me, although I'm open to suggestions)
The catch though is that we would need the values to be keyword and not text, as we actually filter and would like to aggregate them.

I have tried to define a synonym filter in a normalizer, but I get an error saying that the filter is not supported (I suppose because, depending on configuration, it could return multiple values)
Is there any workaround?

I created a custom analyser with a keyword tokenizer and a synonym filter, but is not an option either because it can only be used on text fields..
If at all possible I would prefer to avoid multi fields because of the disk space waste

Am I focusing on synonyms too much? Is there an alternative/better solution?

Other options from my understanding are:

Replace the values in the fields logstasth before ingestion (seems difficult to maintain, although I may be wrong. we already use logstash for ingestion)
Create an elastic ingestion pipeline using the set processor (although it doesnt allow any conditional and we have multiple values for same fields, so its probably a no go)
Create an elastic ingestion pipeline using the script processor (never tried to use painless, not sure how much effort it would require)

Any input is appreciated.
Cheers

jpountz · May 9, 2017, 9:43am

If you already use logstash for ingestion, this looks like the natural place to do this to me. The title of the topic suggests you'd like to index multiple values but I think it would be best to just settle on one (eg. deny) and replace all synonyms (denied, etc.) with it.

system · June 6, 2017, 9:46am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Ingest and Conditional Routing Logstash	7	370	November 30, 2023
Noise Word handling on ingest Elasticsearch	8	1194	February 18, 2020
How are multiple types handled at ingestion? Elasticsearch	4	1064	July 5, 2017
How could I use in ingest pipeline the Logstash translate filter? Elasticsearch	5	1082	February 7, 2021
Dec 13th, 2018: [EN][Elasticsearch] Chaining Ingest Pipelines Advent Calendar	1	1887	December 1, 2019

Indexing multiple synonym values as keywords

Related topics