we are trying to enrich out firewall/router logs with custom geoip data (such as coordinates, client info, internal client ID,...), which we have saved in a dedicated index on our ES cluster.
I'm currently trying out solution with logstash-filter-elasticsearch plugin, but the performance is really poor: cca 9k event/s without the filter and only 1200k events/s with the filter enabled. And since we want to enrich src and dst IPs, that comes down to cca 700k events/s.
Looking at netstat, it looks like the filter is opening a new connection for every query. Adding the latency (since ES and logstash are not on the same servers), it comes down to poor performance. Is there a way for this plugin to keep connection opened?
What can I do to get better performance?
Any other idea how to achieve the same result? I would like to avoid building my own geoip (maxmind like) binary database if possible.
To put some context here, enrichment is a hot topic in Logstash Product management at the moment.
We have released v1 of a jdbc enrichment filter called jdbc_streaming, it has caching so should only have a perf problems on the first occurrence of a new field value. This filter is intended for data that changes fairly regularly e.g. data from a CRM or Transactional DB.
Next we will release jdbc_static, this filter will pull down a record set into local in-memory db and then lookups will be done on the local db using SQL.
Next we will look at improving the ES filter to add caching.
Watch the release notes of LS versions to see when this happens or we'll tweet about it.
I though of that, but I don't want to have an entry for every IP address we have. We have probably a few thousand if not more.
Currently, we are saving top and bottom subnet IPs as IP type in ES and then do a search "bottom<$src_ip AND top>$src_ip". This gives us a single document and we use fields in this document to populate the currently worked-on document.
This probably also wont work for jdbc filter, since I'm not sure sql supports ip type.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.