Elasticsearch Benchmarking

Omer_Uludag · January 29, 2016, 6:31pm

Hello together,

I am fairly new in Benchmarking and also in Benchmarking Elasticsearch. I'm using Apache JMeter in order to assess Elasticsearch's performance. Currently, I am working on the indexing performance of Elasticseach. In order to make it comparable with Splunk and Solr. I am not using the bulk index, instead I am using N-times HTTP requests to Elasticsearch. My question, how I know how long Elasticsearch does take in order to index a HTTP update request? Particularly, JMeter provides a metric which is called throughput. It shows how much requests I can handle per second. So the thing is, if I am sending a request to Elasticsearch and it directly responds with ok, is then the document already created? Because I am interested in the time where Elasticsearch takes time to create the specific document when it gets the HTTP request. Because I think that response time is: time for send http request + time for index document + time for send http response. Remark: JMeter and the Elasticsearch node are running on the same server. I am currently only interested in the time for index the document.
It would be great if you can help me.

Best regards

warkolm · January 30, 2016, 1:23am

You'll get a response back with the time it took, that will contain the time it took.

jasontedor · January 30, 2016, 3:24pm

But is this going to fit your usage pattern? Because if not, it's an inaccurate benchmark as Elasticsearch can handle concurrent bulk requests at incredibly high rates of ingest.

You can get this on a total basis from the nodes stats API:

        "indexing" : {
          "index_total" : 1024,
          "index_time_in_millis" : 560,
          "index_current" : 0,
          "index_failed" : 0,
          "delete_total" : 0,
          "delete_time_in_millis" : 0,
          "delete_current" : 0,
          "noop_update_total" : 0,
          "is_throttled" : false,
          "throttle_time_in_millis" : 0
        },

I don't understand why you need JMeter for this, I'm concerned that it's just going to get in the way and impact the benchmark.

If you're doing synchronous requests, yes.

Now I'm even more concerned it's just going to impact the benchmark.

Omer_Uludag · February 1, 2016, 7:11pm

Hello Jason,

I followed your advice and I do the benchmark just only with Elasticsearch, its Bulk and Node Stats API. One question, how many documents I can send at most?
I am using, e.g.,
curl -s -XPOST localhost:9200/_bulk?pretty=true --data-binary @data_1.json
However, mostly I can send about 50.000 documents. Do you know why I do not receive any respond after sending like 100.000 documents?

Best regards

jasontedor · February 1, 2016, 8:11pm

How big are the documents?

Omer_Uludag · February 1, 2016, 8:12pm

In total 117mb. But I would like to increase the number of documents to 6 Mil. Do I have to take another solution instead of the Bulk API?

Christian_Dahlqvist · February 1, 2016, 8:38pm

I believe the general recommendation is to keep individual bulk requests below 5MB in size, so you should use a smaller bulk size for optimum performance.

Omer_Uludag · February 1, 2016, 9:21pm

So the idea would be: Number of docs divided by 5 mb is equivalent to the number of bulk requests? Do you would do than a loop with sending the particular amount of bulk requests with a batch or something like that? Because I am trying to assess Elasticsearch's indexing performance for indexing a particular amount of data. Sorry for asking this beginner questions.

jasontedor · February 1, 2016, 10:00pm

That's way too large per bulk request. You'll need to find the optimal bulk size for your data set but the recommendation that @Christian_Dahlqvist gave is a good starting point. However, using that as a starting point, you should keep increasing the batch size until you start to see performance stagnate and then use that as your batch size.

jasontedor · February 1, 2016, 10:01pm

Yes.

Performance benchmarks are in general quite tricky to get right, and we are very happy to help.

Omer_Uludag · February 1, 2016, 10:14pm

Hello Jason and Christian,

thank you very much for your replies.
I created a batch file like that:
for i in {0..20} do touch "/Users/oemeruludag/Desktop/benchfiles/data_${i}.json" done python /Users/oemer/Desktop/bench_desktop.py for i in {0..20} do cat data_${i}.json done for i in {0..20} do curl -s -XPOST localhost:9200/_bulk?pretty=true --data-binary @data_${i}.json done

I am sending 20 times a 5mb bulk requests. This is the idea or? And you recommend to try to adjust the size of the bulk request. How do you wold assess the performance? I would use the node stats API and would divide number of docs by indexing time. And I then play with the number of the size of the bulk requests, until this rate does not improve.
Would you recommend to do like that?

Christian_Dahlqvist · February 1, 2016, 10:21pm

You can also send multiple bulk requests in parallel in order to increase performance. Start will a single thread/process and slowly increase this until you see no further increase in throughput.

Omer_Uludag · February 1, 2016, 10:33pm

How exactly can I send bulk requests in parallel? Is this also accomplished by the batch?

Omer_Uludag · February 1, 2016, 11:52pm

For the performance enhancement, I am using also these configurations:

curl -XPUT localhost:9200/bench -d '
{
    "transient" : {
        "indices.store.throttle.type" : "none" 
    }
}'

curl -XPUT 'localhost:9200/bench/_settings' -d '
{
    "index" : {
        "number_of_replicas" : 0
    }
}'

curl -XPUT localhost:9200/bench/_settings -d '
{
    "index" : {
        "refresh_interval" : "-1"
    } 
}'

curl -XPOST 'http://localhost:9200/bench/_forcemerge?max_num_segments=5'

Additionally, I am using: ES_HEAP_SIZE=32g, since I have 131 gb memory.
Do you have some recommendations on them or some other performance increasing settings?

Christian_Dahlqvist · February 2, 2016, 6:52am

You can use multiple threads, or start several scripts in parallel if they are single threaded.

Omer_Uludag · February 3, 2016, 8:38pm

This shows the current results. I have used 11.2mb sized bulk requests. One thread and no parallel indexing. As you can see, the numbers of documents which I index per second decreases with increasing the number of documents. The above settings are still valid and in average the CPU was about 30-40%.
Do you maybe have some plausible reasons for such a chart?

Best regards,

Topic		Replies	Views
In the tradition of unscientific benchmarks :) Elasticsearch	1	288	July 6, 2017
Elasticsearch 1.1.0 Java API Slower Than Curl for Certain Queries Elasticsearch	10	565	July 6, 2017
Http request response to elastic search Elasticsearch	9	512	October 7, 2020
Elasticsearch unknown perfomance issue Elasticsearch	2	356	July 6, 2017
Elasticsearch real time write indexing performance Elasticsearch	10	1262	May 31, 2020

Elasticsearch Benchmarking

Related topics