We are thinking about gathering our DNS traffic via packetbeat and create statistics out of the collected data. Currently we have a query rate of about 200k/s and growing... Because of the tremendous amount of data, we would need to aggregate the data and push it to a smaller index.
But to me it seems that every possible solution which comes with Elasticsearch (transforms, rollups etc.) can't handle Indices with a very high index rate. For example, a transform with a frequency of 1s (lowest possible) and page size of 10k (default) I'm not able to process more than 10k documents per second which is way too low. Is there a way to achieve our goal?
This is hard to answer, especially given the little details. Transform can go as fast as aggregations and indexing can go. Now 20x is a huge gap between your POC implementation and your goal.
There is no fixed limit in transform or rollup, it is just a matter of resources. Despite the option to add more hardware I recommend our documentation for running transform at scale.
However, it would be nice if you can describe the use case in more detail. I wonder if transform is the right tool for your case, are you summarizing only on terms?
For in-depth questions and more detailed performance guidance it might be better to contact support.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.