Hi Logstash Gurus,
I think I need to use the Memcached Plugin for a design problem… but I’m not sure.
Here’s my issue: I have a Docker container Logstash instance (v7.4.0, yes I need to upgrade) currently processing several millions of data records every second. Let’s say the data records look like this:
“Alice”, …more data…
“Beth”, …more data…
“Alice”, …more data…
“Alice”, …more data…
…and so on. The first field is the “Person” field, essentially identifying the source of the data record. Right now, everything it working great.
But I am about to implement some changes in my filter{} config that would mean a lot more processing time per data record, involving calls to DNS servers, SQL servers, HTTP servers, perhaps more. With all those network calls, I’m worried that the processing time per data record could be considerable indeed. Besides, I don’t necessarily need to do all this processing with every record. I need to find a way for Logstash to process some records, but skip others. If a person sends more data within, say, 5 seconds of their last seen data record, Logstash can skip all that extra processing.
If this was a C program, I’d want to implement a dynamic hash table, where hash entries will time out and be removed after a TTL of 5 seconds.
Imagining that Logstash could do such a hash table, I’d like LS to do the following every time a new data record arrives:
- Check to see if the sending person is in the hash table:
- If yes, simply update that person’s TTL in the hash table
- If no, run all those network lookups, modify the data record accordingly, then insert that person into the hash table
- And of course, the hash table should automatically remove all entries with an expired TTL.
I’ve been combing through the Logstash documentation, and I think the tool I need is the Memcashed plugin...? Not sure. But if so, I imagine it would look something like this in the config file:
filter {
memcached {
set => {
"Person" => "memcached-key-1"
ttl => 5
}
If ( new record ) {
# Run all those network lookups, modify data record accordingly
}
}
}
Obviously, this is badly sketched out. But its as far as I got.
Any thoughts or advice? I am appreciative for any and all help you can offer.