Logstash translate filter - multiple matches

I have a situation where I am using a dictionary file with the translate filter. The keys are IPs or regexes for the IPs in IP blocks. What happens if an IP matches more than one dictionary entry, eg it's listed individually in one entry and belongs to a block in another entry? Can I count on the value from first match (if reading from the top of the file to the end) being returned, as happened in my little experiment?

Internally, the Translate Filter Plugin currently uses a Ruby Hash to store a mapping between keys and values, which means that by the time it is performing a lookup, there is at most one possible value for any given key -- duplicates have already been eliminated. It makes no guarantees about which of several duplicate entries will win.

Well, the keys are not duplicates, the regexes will vary, it's just that the IP being looked up could match more than one key.
I could make it work if I knew that either the first or last match is returned and the order in which the keys are examined is the same as the order I have them in the dictionary file, but after more looking, I think that might depend on which version of the filter and/or Ruby that I have. Is that correct?

Thanks

Yes, that is correct. From what I can tell, ordering under multiple mathes is not a guarantee that is made by the Translate Filter Plugin.

But I understand what you are asking now. I had forgotten about the regex feature of the plugin.

It may be possible to start providing dictionary-ordering guarantees, since the current implementation does in fact use ordered constructs all the way through. By adding explicit tests to the filter plugin that validate the handling of regex matching when multiple patterns would match, we could ensure that future changes to the plugin keep the desired behaviour.

Would you be willing to open an issue or pull-request on the Logstash Translate Filter Plugin repository? Feel free to tag me, and I'll follow up on it.

Sorry for the delay in responding.

I'm not totally clear on what you are saying. So the most recent version of the plugin does use a fixed order for examining dictionary entries and always returns the first or last match if there are multiple matches? I believe I read, after posting my question, that it does, but we are currently using an older version in which the order is not always the same. Perhaps we just need to upgrade.

And are you suggesting adding tests so that the ordering and whether the first or last match is returned is preserved in future releases, or are you thinking some other change should be requested?

Thanks!

When writing code against any API, you can rely on one of two sets of behaviours: documented or observed.

With documented behaviour (especially behaviour documented in executable specs or tests), you can trust that future releases of the software will either continue to have the behaviour, or that the maintainers will call out the change of behaviour in their changelog (typically with a major-version release).

When you write code against observed, undocumented behaviour, you run the risk that the behaviour will change unexpectedly, even in a patch-level release, because the maintainers may not be aware of the specifics of your implementation needs. A minor refactoring to add a feature or fix a bug could change the behaviour in a way that complies with the documented behaviour, but still breaks your use-case.

Since the plugin in question currently is observed to work in the ordered way you desire, and I have told you that the intermediate structures we use internally do so without being implementation-specific, we could start providing ordering guarantees so that you could rely on them now and in the years to come. To do so, we would gladly work with you to accept a pull-request that (a) documents the behaviour and (b) adds tests/specs to validate in code that order is maintained in the specific way you desire. That way as the community continues to evolve the plugin, people who are not privy to your ordering needs are empowered to contribute in a way that doesn't break your use-cases.

If opening a pull-request is too daunting, feel free to open an issue asking for the ordering guarantees, being as specific with examples as possible.

Feel free to link the issue or pull-request here to make sure I see it, so I can help :slight_smile:

Ah, thanks for the clarification!

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.