Dear ESs
we are in the phase of selecting technology for having a custom Netflow collector,
we are expected to deal with almost 4 mil record per minute. has anybody a use case for that? it shall be a local installation, not a cloud one. and it has to be 1 box only.
can ES do it? and what is the best setup for that ? are there a magic numbers for performance tuning ?
4m / minute is roughly 66k / s, which is in the ballpark of what ES can handle. But it really depends on node configuration, the size and mapping of your documents, latency SLAs, etc etc.
I'd recommend setting up a realistic Rally test(github link) to help benchmark your setup.
You may also be able to find some useful information in these perf tuning articles, although be warned they are starting to get a little outdated:
How retention affects your node depends largely on A) how much disk space you'll need and B) how often you query old data. If historical data mostly sits idle and is never queried, it won't have much impact on indexing speed. But if you're frequently querying old data, it'll have a larger impact and you'll need to size more appropriately (potentially with multiple nodes).
E.g. 4m docs / min == 2,102,400,000,000 docs / year aka 2 trillion docs. If we're conservative and say each doc only uses 48 bytes (I don't know how big netflow records are, but I'm guessing they are pretty small), that's 100Tb of storage for a single node. Even assuming ES can do something crazy like 50% compression, it's still 50Tb. So I think you'll need to either reduce your retention period, or start expanding to multiple nodes.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.