I am fairly new in Benchmarking and also in Benchmarking Elasticsearch. I'm using Apache JMeter in order to assess Elasticsearch's performance. Currently, I am working on the indexing performance of Elasticseach. In order to make it comparable with Splunk and Solr. I am not using the bulk index, instead I am using N-times HTTP requests to Elasticsearch. My question, how I know how long Elasticsearch does take in order to index a HTTP update request? Particularly, JMeter provides a metric which is called throughput. It shows how much requests I can handle per second. So the thing is, if I am sending a request to Elasticsearch and it directly responds with ok, is then the document already created? Because I am interested in the time where Elasticsearch takes time to create the specific document when it gets the HTTP request. Because I think that response time is: time for send http request + time for index document + time for send http response. Remark: JMeter and the Elasticsearch node are running on the same server. I am currently only interested in the time for index the document.
It would be great if you can help me.
But is this going to fit your usage pattern? Because if not, it's an inaccurate benchmark as Elasticsearch can handle concurrent bulk requests at incredibly high rates of ingest.
I followed your advice and I do the benchmark just only with Elasticsearch, its Bulk and Node Stats API. One question, how many documents I can send at most?
I am using, e.g., curl -s -XPOST localhost:9200/_bulk?pretty=true --data-binary @data_1.json
However, mostly I can send about 50.000 documents. Do you know why I do not receive any respond after sending like 100.000 documents?
I believe the general recommendation is to keep individual bulk requests below 5MB in size, so you should use a smaller bulk size for optimum performance.
So the idea would be: Number of docs divided by 5 mb is equivalent to the number of bulk requests? Do you would do than a loop with sending the particular amount of bulk requests with a batch or something like that? Because I am trying to assess Elasticsearch's indexing performance for indexing a particular amount of data. Sorry for asking this beginner questions.
That's way too large per bulk request. You'll need to find the optimal bulk size for your data set but the recommendation that @Christian_Dahlqvist gave is a good starting point. However, using that as a starting point, you should keep increasing the batch size until you start to see performance stagnate and then use that as your batch size.
thank you very much for your replies.
I created a batch file like that: for i in {0..20} do touch "/Users/oemeruludag/Desktop/benchfiles/data_${i}.json" done python /Users/oemer/Desktop/bench_desktop.py for i in {0..20} do cat data_${i}.json done for i in {0..20} do curl -s -XPOST localhost:9200/_bulk?pretty=true --data-binary @data_${i}.json done
I am sending 20 times a 5mb bulk requests. This is the idea or? And you recommend to try to adjust the size of the bulk request. How do you wold assess the performance? I would use the node stats API and would divide number of docs by indexing time. And I then play with the number of the size of the bulk requests, until this rate does not improve.
Would you recommend to do like that?
You can also send multiple bulk requests in parallel in order to increase performance. Start will a single thread/process and slowly increase this until you see no further increase in throughput.
Additionally, I am using: ES_HEAP_SIZE=32g, since I have 131 gb memory.
Do you have some recommendations on them or some other performance increasing settings?
This shows the current results. I have used 11.2mb sized bulk requests. One thread and no parallel indexing. As you can see, the numbers of documents which I index per second decreases with increasing the number of documents. The above settings are still valid and in average the CPU was about 30-40%.
Do you maybe have some plausible reasons for such a chart?
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.