The elasticsearch indexing rate is 13,800 events/s and this seems to the bottleneck.
What i dont understand is Elasticsearch CPU Utilization is 10% and JVM Heap Used is 6GB/16GB. Then why is the indexing rate still so low? What other factors should we consider to stress the elasticsearch system?
Any suggestions on improving this performance would be highly appreciated.
Have you optimised Elastichsearch for indexing speed? What type of storage do you have? What is disk I/O and iowait looking like? How many nodes in the cluster?
After a brief benchmark, it appears that the problem was with Elasticsearch indexing speed. Thanks, Christian!
For Benchmarking purposes, we have a single node Elasticsearch. This has a single index with one shard and no replicas. We use a 400GB SSD , 32GB RAM in which 16GB is allocated for Heap, 12 Core Processor.
we are using a single thread and a 2GB flush threshold and 30s refresh interval.
Index settings are as follows.
i am unable to index more than 60,000 document/s from Filebeat.
Using X-Pack monitoring, the CPU Utilisation is 60%, JVM Heap Utilization is 72% and disk I/O is 130 MB/s. Clearly none of these factors is the bottleneck.
I am not sure what else might be the bottleneck. Is there a way to find what factors might attribute to this?
Thanks in advance.
If you are using dynamic mappings (I am guessing this may be the case based on the number of fields you have specified) and are adding fields as indexing progresses, each change will require the cluster state to get updated, which can slow indexing down.
Sorry, that parameter was not needed. We use static mappings. Any other factors that might affect this performance?
I am wondering why its set at 60,000 documents/s when Elasticsearch can do much more. Not knowing what the bottleneck is bothering me. I am sure i am missing something here.
Documents per second is not really a very good measurement of indexing performance as it will depend a lot on the size and complexity of the documents being indexed. You will get a better comparison if you run the same Rally track on your hardware.
Yeah, that makes sense. But, this benchmark is for HTTP logs and i am importing raw logs from a HTTP server as well. Hence, was hoping it would be close enough.
I am still struggling with finding what else could be the bottleneck. Any pointers to that is highly appreciated.
The standard HTTP logs track uses very small documents, so it may or may not be comparable. I created a track that simulates events that are a bit larger and probably is closer to what you would get out of Filebeat. We talked about it here and it is available on GitHub.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.