Dear All,
I have a use case where i need to be able to compare high volume of data (fields like ip addr) to large IP databases.
I have think about the approach of using ruby filter with redis socket connection (with init => to start only once), then request redis DB using mget...
Pros: - Redis supposed to be high performances
- No need to restart / send kill -HUP signal to reload a dictionnary file (maybe not necessary ?)
Cons: Didn t see any config of that kind here... but being the first doesn't mean doing the wrong way
The other approach would be to use standard translate {} filter with a dictionnary + exact => true
I am now wondering if some of you have an idea of impact/performances on large volume for these 2 solutions ?
Also for the dictionary option is there any "clean" way to reload the yaml dict frequently ?
This blog post may be useful to you. It describes a prototype memcached plugin that sounds close to what you are looking for, although it is just available as an early prototype. Maybe this could be used or act as a blueprint for how to create a plugin that interfaces with Redis?
Last time I looked the translate plugin could take some time reloading, so changing the dictionary very frequently might not be a good idea (I am not sure, so need to be tested). You could however also look into the jdbc_streaming plugin as may work for your use case.
Hi Christian,
Thank you for your answer it is much appreciated !
I was not aware of this article which seems to be very interesting nad learnt me new approaches..
It also seems that "translate" would not be suitable for large volumes / dynamic updates...
The good thing is that I found someone which has does the same as my idea !!! (last year) with what looks like amazing results..
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.