Indexing speed in ES v1.71 approx 18% slower than v1.4.5

tinle · August 31, 2015, 2:52am

Exact same test system.

5 physical nodes (64G RAM, 2.2TB SSD iofusion card)
Intel(R) Xeon(R) CPU E5-2640 0 @ 2.50GHz (two sockets x 6 cores = 12 cores w/HT on)

Reindexing. Reading from another ES cluster, bulkloading into this cluster.

I see that v1.7.1 is 18% slower in indexing speed.

warkolm · August 31, 2015, 3:57am

What sort of data?
What are you using for reindexing?

tinle · August 31, 2015, 4:00am

Log data. We have lots and lots of log data (300+ TB)

I've tried both logstash v1.5.4 (plus patch from this bug report https://github.com/elastic/logstash/issues/3817#issuecomment-135893991) and a program I wrote in Go.

The Go program has slightly higher performance (less mem, lower CPU), but still same difference in performance. About 18% slower on v1.7.1 vs v.1.45.

warkolm · August 31, 2015, 4:15am

Do you have more details on the timings you generated?
Can we see the LS config?
Are the mappings the same across both clusters? Can we see those?

tinle · August 31, 2015, 4:20am

Certainly. The logstash config is very simple and is listed in the bug report

github.com/elastic/logstash

logstash v1.5.4 misleading error message when heapsize is too small

opened 08:49PM - 28 Aug 15 UTC

closed 01:52PM - 01 Jun 17 UTC

TinLe

bug

If logstash is started with too small of a heap, it barfed with misleading error… message. Fortunately, I just changed one value, so I knew exactly what broke. I had LS_HEAP_SIZE=2000m and lowered it to 1000m, when logstash is started, the log contain: Exception in thread "|worker" Exception in thread "<elasticsearch" java.lang.UnsupportedOperationException at java.lang.Thread.stop(Thread.java:869) at org.jruby.RubyThread.exceptionRaised(RubyThread.java:1221) at org.jruby.internal.runtime.RubyRunnable.run(RubyRunnable.java:112) at java.lang.Thread.run(Thread.java:745) java.lang.UnsupportedOperationException at java.lang.Thread.stop(Thread.java:869) at org.jruby.RubyThread.exceptionRaised(RubyThread.java:1221) at org.jruby.internal.runtime.RubyRunnable.run(RubyRunnable.java:112) at java.lang.Thread.run(Thread.java:745) Changing the LS_HEAP_SIZE to 1500m allows it to run again. So evidently, I lowered it too much.

I bring up the 5 node cluster fresh, e.g. no data. Note start time when I start up logstash and let it run for several hours. Performance testing of this cluster and logstash, plus my golang program is all I have been doing since Friday morning and all weekend.... Yeah, it has been a fun weekend for me

The mappings is exactly the same each time. The only difference is ES v1.4.5 and ES v1.7.1

In doing this performance test, I am keeping everything the same as much as possible, changing just one thing. ES version.

More information and link to the source code for my little go copy/reindexer is here:

Clinton_Gormley · September 1, 2015, 12:29pm

Two changes between 1.4 and 1.7 occur to me:

Auto-generated ID optimization removed (but this was 1.4.3)
Bloom filters removed

Bloom filters were maintained in memory to check which segment(s) might contain a document with a particular ID. This was fast, but used up a lot of memory, and the more open indices, the more memory was used.

Instead, we now do a term lookup which is usually almost as fast as the bloom filter. However... the performance of this lookup depends heavily on how many segments you have. If merges are not keeping up then you end up with more segments, which makes these term lookups slower.

What disks are you using? What throttling / merge settings do you have? Are you seeing any throttling in the stats? How many segments are you seeing on disk during the indexing process?

tinle · September 1, 2015, 4:47pm

I am using Fusion-IO 2.2TB SSD drives (5 nodes x 2.2TB SSD each).

One issue is that I am read throughput limited (single reader). Beside that though, it is consistent that 1.7.1 is slower indexing than 1.4.5. I think the two changes you listed are possible suspects. I will re-setup the tests to use my own ID generator and run them again when I get a chance. I am working on fixing the read throughput side at the moment.

This is the settings I am using for the test cluster. Currently it's purely indexing, there is no querying, no client access. I just want to see how fast I can reindex from another cluster to this one.

Indexing Settings

index.refresh_interval: -1

SSD

index.store.throttle.type: none
index.store.throttle.max_bytes_per_sec: 700mb
index.translog.flush_threshold_size: 1gb
indices.memory.index_buffer_size: 512mb

Attached is a graph of the segments in the cluster over last 12hrs.

Clinton_Gormley · September 1, 2015, 5:22pm

The autogenerated ID optimization was removed in 1.4.3, so it shouldn't come into play. What are your node and index stats showing for throttling?

tinle · September 1, 2015, 5:32pm

No throttle last 12hrs. I/O wise, this cluster is sleeping

mikemccand · September 1, 2015, 9:05pm

Bloom filters were already disabled (still indexed, but not used/loaded) by
default as of 1.4.0, so that shouldn't explain the change, unless in your
1.4.5 test you had enabled them?

Those segment counts look healthy (merges are keeping up), and that's good
that there's no indexing nor store throttling.

How many concurrent threads are indexing, and how many docs per bulk
request?

If you change to refresh_interval: 10s for both 1.4.5 and 1.7.1 does it
change the result? This may increase effective concurrency of the test
since a separate thread is writing segments.

If you drop index.number_of_replicas to 0, does that change the result?

We track indexing performance nightly at
https://benchmarks.elastic.co/index.html ... it's somewhat noisy, and it
tracks master (not 1.x releases), but performance has not changed that much
for the "fast" settings over time.

Mike McCandless

tinle · September 1, 2015, 11:47pm

Which stats indicate concurrent threads that are indexing? I have tried both 500 and 1000 docs per bulk request. I did not see much difference. The doc size average around 4KB.

Replicas have been set to 0 from the beginning. I don't use replicas for active index.

I can test changing refresh interval and see what I get. Usually it's set to off (-1) or 90s for active indices.

mikemccand · September 2, 2015, 10:24am

I think (not certain) for logstash it's the number of workers you specify

mikemccand · September 2, 2015, 12:01pm

Hmm my response (via email) was truncated for some reason ... trying again:

I think (not certain) for logstash it's the number of workers you specify for the Elasticsearch output? I think the default is 1: https://www.elastic.co/guide/en/logstash/current/plugins-outputs-elasticsearch.html

Or in your custom Go client, you would control how many threads (goroutines?) are sending bulk indexing requests concurrently.

You can ask for node stats, then look under thread_pool -> bulk -> active to see how many bulk threads are currently working, but this isn't a precise measure since when you check, it could be your client side threads had just received results and are working to prepare the next request and so they are not counted.

You want to have enough client side concurrency so the server side resources are saturated (either CPU or IO or, if you're lucky, both!).

tinle · September 2, 2015, 10:01pm

I didn't have time to work on this today, but I did tried one thing. Increasing the bulk request size from 1000 to 5000 and then to 10000, both at 5K and 10K requests, I see an increase in indexing speed (higher CPU which is what I expected, no throttling). The speed increase is around 2.8%. I am leaving it running at 10K bulk size till later tonight when I have more time to look into it.

Topic		Replies	Views
Elasticsearch/Logstash low indexing rate Elasticsearch	6	453	July 13, 2020
Indexing performance terrible after upgrading from 1.6 to 2.4 Elasticsearch	2	472	July 5, 2017
Slow Indexing Speed Elasticsearch	5	7200	July 6, 2017
Troubleshoot "low" Logstash -> ES indexing rate Elasticsearch	17	2411	April 13, 2017
Slower indexing Elasticsearch	9	645	September 15, 2017

Indexing speed in ES v1.71 approx 18% slower than v1.4.5

Indexing Settings

SSD

Related topics