Performance of painless scripts in 6.x

vangap · June 25, 2018, 6:25pm

I have tested with 6.3 + Java 10 in docker and also 6.2.4 + Java 8 in docker.
Behavior is same.

With the same query, ES 6.x latencies are 10x slower than 5.6.4. I have observed similar behavior with few other scripted queries also.

What I have observed is that the heap usage is same in both, latency when there is no concurrency is same.
When there is concurrency or continuous load even with 1 concurrency, ES 6.x CPU usage is very high(100% across all 4 cores).
The only metric I see different is the young GC count which is very high in ES 6.x, which might explain the high CPU usage? Please refer the attached images.
Is ES 6.x somehow less performant? it's hard to believe. So, is it something to do wth any default setting that has changed, that I need to tune?

Data - 1.1 M docs
4 GB, 4 core with 2 GB set to heap
ES query used to load test

gist.github.com

https://gist.github.com/vanga/25676b40a248f7b99f7e03d5d51a2b59

gistfile1.txt

{
  "size": 0,
  "query": {
    "bool": {}
  },
  "aggregations": {
    "sum_agg": {
      "filters": {
        "filters": {
          "123": {

This file has been truncated. show original

5.6.4

6.2.4
![new1|690x402]
(upload://mdiKLWLrDo1ukGAzDOg0XZKinzJ.png)

danielmitterdorfer · June 28, 2018, 5:34am

Hi,

hard to tell what is going on based on your description. Is there a chance that you can (privately) share an anonymised version of the document corpus so we can have a closer look?

Daniel

vangap · June 28, 2018, 6:26am

@danielmitterdorfer Thanks.
Do you need the whole dataset + mapping or just few rows would be sufficient for you to test?

danielmitterdorfer · June 28, 2018, 6:44am

The whole dataset (incl. mapping) would be great so we can reproduce your scenario. It would be great if you could share a download link via a private message.

vangap · June 28, 2018, 2:13pm

Hi @danielmitterdorfer

You were saying this

It would also be good if you could share the script with the actual field names from the data set because we suspect that the difference might have to do with doc values and then it is important again that we access the same fields that you do.

Does that mean that the performance of doc_values depends on the length of field names?

Thanks.

danielmitterdorfer · June 29, 2018, 4:50am

Hi,

no, I was not referring to the length of the field names but it might make a difference whether you access a field that has 5 distinct values or 5 million distinct values.

Daniel

danielmitterdorfer · June 29, 2018, 11:24am

I created a benchmark based on the data that you have provided. It first indexes all data and then it runs your Painless query with four clients concurrently at a rate of five operations per second. I ran that against the Docker images for 5.6.4 and 6.2.4 with a heap size of 2GB on our nightly benchmarking hardware.

For that scenario I get the following results:

Metric	Task	ES 5.6.4	ES 6.2.4	Diff	Unit
Total Young Gen GC		2.974	2.431	-0.543	s
Total Old Gen GC		0.211	0.181	-0.03	s
Store size		0.514003	0.514522	0.00052	GB
50th percentile latency	painless	269.538	257.694	-11.8438	ms
90th percentile latency	painless	302.744	290.283	-12.4613	ms
99th percentile latency	painless	322.716	304.926	-17.7899	ms
100th percentile latency	painless	367.526	317.751	-49.7747	ms
error rate	painless	0	0	0	%

It could be the case that the benchmark is still not capturing your scenario or something else is going on in your environment. I'll send you a link to the benchmark in a PM so you can try it yourself and see whether you see a difference.

vangap · June 30, 2018, 1:45pm

Thanks a lot @danielmitterdorfer
I am gonna have to re run my tests and see if there is something different in my test approaches.

vangap · July 2, 2018, 11:17am

I have run my tests again.
There was an issue with my load test where all requests were not reaching ES sometimes because of another issue.

I have now run the queries directly on ES using apache bench and I do see similar latencies more or less between 5.6.x and 6.3.

@danielmitterdorfer Thanks alot for your help. I should have looked at my test setup closely instead of suspecting the ES

danielmitterdorfer · July 2, 2018, 11:50am

Hi,

thanks for the feedback. Glad it turned out that Elasticsearch performs equally well.

Daniel

system · July 30, 2018, 11:51am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Slow bulk indexing performance Elasticsearch	6	1363	December 11, 2018
Performance of painless scripts: inline vs stored Elasticsearch painless	2	534	August 12, 2020
[SOLVED] Painless, shards and multi-threading Elasticsearch	7	1935	April 3, 2018
ES 7.10 docker performance issues Elasticsearch docker	1	402	December 21, 2021
[SOLVED] Thread pool for Painless aggregation Elasticsearch	8	953	March 31, 2018

Performance of painless scripts in 6.x

Related topics