Elasticsearch Benchmarking Results Intepretation

Hello together,

I have benchmarked the performance of our Elasticsearch server and I would like to show you one interesting finding.


As you can see, all the different settings: Changes in Bulk sizes (5 vs 11mb) and threads sizes (1,2,5 and 10 thread(s)) have the common result when it comes to the decrease of the indexing rate between 1 mil and 2 mil documents. 1 Mil documents have about 850-950mb. Furthermore, the blue and red line are 1 thread results. They perform surprisingly at best. Here, I have to say that I did not change any setting of Elasticsearch in regarding to the treadpools, I just send my HTTP requests in parallel. My settings are:
curl -XPOST localhost:9200/bench -d '{ "transient" : { "indices.store.throttle.type" : "none" }, "mappings" : { "bench" : { "_source" : { "enabled" : false }, "_all" : {"enabled" : false}, "properties" : { "level" : { "type" : "string", "index" : "not_analyzed" }, "time" : { "type" : "string", "index" : "not_analyzed" }, "timel" : { "type" : "string", "index" : "not_analyzed" }, "id" : { "type" : "string", "index" : "not_analyzed" }, "cat" : { "type" : "string", "index" : "not_analyzed" }, "comp" : { "type" : "string", "index" : "not_analyzed" }, "host_org" : { "type" : "string", "index" : "not_analyzed" }, "req" : { "type" : "string", "index" : "not_analyzed" }, "app" : { "type" : "string", "index" : "not_analyzed" }, "usr" : { "type" : "string", "index" : "not_analyzed" }, "vin" : { "type" : "string", "index" : "not_analyzed" }, "thread" : { "type" : "string", "index" : "not_analyzed" }, "origin" : { "type" : "string", "index" : "not_analyzed" }, "msg_text" : { "type" : "string", "index" : "not_analyzed" }, "clat" : { "type" : "string", "index" : "not_analyzed" }, "clon" : { "type" : "string", "index" : "not_analyzed" }, "location": { "type": "string", "index" : "not_analyzed" } } } } }' curl -XPUT 'localhost:9200/bench/_settings' -d ' { "index" : { "number_of_replicas" : 0 } }' curl -XPUT localhost:9200/bench/_settings -d ' { "index" : { "refresh_interval" : "-1" } }' curl -XPOST 'http://localhost:9200/bench/_forcemerge?max_num_segments=5'

Furthermore the heap size was changed to 31gb.

Do you have any explanations to the certain decrease between 1 and 2 mil documents?

Thank you very much in advance.

Best regards

1 Like

It'd be useful to have a key on the graph so I could tell what the lines meant.

Its hard to say. Eventually merging starts to come up and if merging falls behind Elasticsearch throttles indexing to put backpressure on the indexing application. You might want to have a look at the logs because it'll mention if it applies backpressure. Otherwise its worth graphing the IO rate while this goes on and lining up the graphs.

I don't know of any magic constant that says "near a million docs slow down" so its kind of hard to guess.

Sorry, I have updated the graph above. Another important point to mention is the order where I have lines. The blue and red line were the first requests on the Elasticsearch server, whereas the darkblue and brown where the latest requests. In other words, it seems that the performance also decreases with the time, or its because of the threads. After each observation per document, thread and bulk size, I delete the current index and create one, so that the stats and the storage are resetted.

Depending on your version there is a bug that makes that not work super well if you reuse the name of the index. It might be useful to try giving the index a name based on the current time or something like that.

Does this bug also apply for version 2.1.1? I will change the index name by time. I tried to find the information on the backpressure, but was not successful. What kind of log message it should be?

No, that bug was fixed in 2.1.1.

Hello Jason,

I have changed the document sizes, now I am indexing 1Mil to 60Mil documents. Hereby, the same trend can be seen. Can it be, that this is a common problem? Namely, that if the sizes of the documents indexed increases, that the overall indexing rate is also affected?

I have prepared some charts of data I index into production, see this gist

and this gist

(see the comment for the configuration).

After ~30 seconds, the ES memory buffers are full and flushed to disk, with more than usual I/O and some GC. After some minutes, the bulk indexing rate "swings in" and stabilizes at a constant rate. I have charts of two indices where this effect occurs - in the 3-Minute-Index, it is not very visible, in the 12-Minute-Index, it can be noticed more clearly, there are several curves.

What you probably observe is the same effect, visible in your chart between "1,000,000 docs" and "2,000,000" docs. You push data, fill the buffers, and after a delay, the buffers are filled up and must be processed at once. In your numbers, this effect is not easy to recognize, since you are not recording the data volume indexing rate against a time line. Maybe your doc sizes also vary after 1,000,000 and increase. Maybe you can get some numbers for measuring data volume over a long period, of at least ten minutes or so.

Hello jprante,
thank you very much for your reply. Yes I think, it has to do with the ES memory buffer. Therefore I have considered not to take the average indexing time but the total index time. I would like to share my results for 1 Mil documents to 60 Mil documents. The trend is the same, a slightly negative trend. The documents are always the same size :slightly_smiling: For instance, for 60 Mil Documents, it took about 2 hours. But another finding is, that the indexing rate is not increasing when I add one node more and push my request always to the same node. Do I have to request to the other one too? Because, normally, the indexing rate must increase by adding one additional node.
The second picture shows you my total indexing time per data volume. The indexing time is behaving nearly linear constant.

This is hard to discuss, maybe you can give more info?

Is the "Indexing time" you measure a continous or a discrete curve? Lokks like discrete. Does it include the command

curl -XPOST 'http://localhost:9200/bench/_forcemerge?max_num_segments=5'

or not? If so, drop this command, forcemerge is neither required nor does it run in predictable time.

If index time is constant when adding a node, you should really take a close look at the client. In such cases, the client is often the bottleneck and can not exercise the cluster.

Maybe you can describe your system setup? From what you mention for your last run, I only understand so far

Elasticsearch version: ?
Servers: ?, CPU cores per server: ?
OS: ? Java version: ?, heap: 31 GB, GC: ?
Nodes: ?
Client type: ?
Bulk request concurrency: ?, bulk request size: ?
Index: 1, docs: 60,000,000, shards: ?, replica: ?, refresh_interval: -1
Mapping: 17 fields, Source: disabled, All: disabled
Data volume: 60,000,000 * ~900 bytes = ~54.000.000.000 bytes
Time: 120 min.

Hello Jorg,
It looks continous, but I am doing that discrete, means I have the indexing values for 100k, 200k, 400k, ..., 1Mil, 2Mil,...,6Mil documents. Yes it includes curl -XPOST 'http://localhost:9200/bench/_forcemerge?max_num_segments=5' Why it is not good to use this? Here some information:
Elasticsearch version: 2.1.1
Servers: ?, CPU cores per server: 4 cores and more detailed information:
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 16
On-line CPU(s) list: 0-15
Thread(s) per core: 2
Core(s) per socket: 4
Socket(s): 2
NUMA node(s): 2
Vendor ID: GenuineIntel
CPU family: 6
Model: 63
Stepping: 2
CPU MHz: 1204.082
BogoMIPS: 7001.11
Virtualization: VT-x
L1d cache: 32K
L1i cache: 32K
L2 cache: 256K
L3 cache: 15360K
OS: Ubuntu 14.04 Java version: java version "1.7.0_91", heap: 31 GB, GC: standard settings, but here an actual value:
"gc" : {
"collectors" : {
"young" : {
"collection_count" : 30619,
"collection_time_in_millis" : 2124604
},
"old" : {
"collection_count" : 55,
"collection_time_in_millis" : 8101
}
}
},
Nodes: 1
Client type: ? What are you mean here?
Bulk request concurrency: 1,2,5,10, bulk request size: 5 and 11mb (Here I used different scenarios. Look at the first graph, there you can see that I have used for one node 4 different threads and 2 different bulk sizes)
Index: 1, docs: 60,000,000, shards: 2, replica: 0, refresh_interval: -1
Mapping: 17 fields, Source: disabled, All: disabled
Data volume: 60,000,000 * ~900 bytes = ~54.000.000.000 bytes
Time: 120 min.
Further configurations in Elasticsearch:
"transient" : {
"indices.store.throttle.type" : "none"
},
"mappings" : {
"bench" : {
"_source" : { "enabled" : false },
"_all" : {"enabled" : false},
"properties" : {
"level" : { "type" : "string", "index" : "not_analyzed" },
"time" : { "type" : "string", "index" : "not_analyzed" },
"timel" : { "type" : "string", "index" : "not_analyzed" },
"id" : { "type" : "string", "index" : "not_analyzed" },
"cat" : { "type" : "string", "index" : "not_analyzed" },
"comp" : { "type" : "string", "index" : "not_analyzed" },
"host_org" : { "type" : "string", "index" : "not_analyzed" },
"req" : { "type" : "string", "index" : "not_analyzed" },
"app" : { "type" : "string", "index" : "not_analyzed" },
"usr" : { "type" : "string", "index" : "not_analyzed" },
"vin" : { "type" : "string", "index" : "not_analyzed" },
"thread" : { "type" : "string", "index" : "not_analyzed" },
"origin" : { "type" : "string", "index" : "not_analyzed" },
"msg_text" : { "type" : "string", "index" : "not_analyzed" },
"clat" : { "type" : "string", "index" : "not_analyzed" },
"clon" : { "type" : "string", "index" : "not_analyzed" },
"location": { "type": "string", "index" : "not_analyzed" }
}
}
}
}'

In ES 2.x, the segment merger has undergone important improvements such as auto-adapting to current I/O capacity, it will never fall behind like it happened to be in ES 1.x

So after you stop bulk indexing, there is no need to manual force merges, like it was in 1.x under certain circumstances.

Forcing a merge depends on the number of segments and docs, and is a very expensive operation, a safe method to destroy the expressiveness of any benchmark (especially with weak CPU of 4 cores only). You should check if indexing times are getting reasonable without that operation.

1 Like