Hmm, to pre-process, it'll have to be somewhere between input -> consumption in my pipeline, marked #[1-3] in the 3 places below:
netflow producers -> (logstash #1-> redis #2) #3-> logstash4 -> elasticsearch4
Ideally it would be logstash doing that work, since it's already reading it either at #1 or #3, but the reverse DNS lookup is very slow, and I'm uncertain if that's a logstash problem, or simply just the lookup latency. As a point to note, when I changed the DNS lookup to be at #1 in the pipeline, the number of netflow packets captured was only ~20% of expected, so I can guesstimate the slowdown to be ~5x (i.e. about 80% of packets ended up being dropped at the UDP input phase).
Which means likely I need to batch the operation on the redis side; simplistically 2 lists, e.g.
logstash:redis:rawips
logstash:redis:hostnamesresolved
and a script to read the logstash:redis:rawips list for each unique raw IP in turn and reindex all associated items into the logstash:redis:hostnamesresolved list.
My head hurts; any volunteers?