Indexing speed in ES v1.71 approx 18% slower than v1.4.5


(Tin Le) #1

Exact same test system.

5 physical nodes (64G RAM, 2.2TB SSD iofusion card)
Intel(R) Xeon(R) CPU E5-2640 0 @ 2.50GHz (two sockets x 6 cores = 12 cores w/HT on)

Reindexing. Reading from another ES cluster, bulkloading into this cluster.

I see that v1.7.1 is 18% slower in indexing speed.


(Mark Walkom) #2

What sort of data?
What are you using for reindexing?


(Tin Le) #3

Log data. We have lots and lots of log data (300+ TB) :smile:

I've tried both logstash v1.5.4 (plus patch from this bug report https://github.com/elastic/logstash/issues/3817#issuecomment-135893991) and a program I wrote in Go.

The Go program has slightly higher performance (less mem, lower CPU), but still same difference in performance. About 18% slower on v1.7.1 vs v.1.45.


(Mark Walkom) #4

Do you have more details on the timings you generated?
Can we see the LS config?
Are the mappings the same across both clusters? Can we see those?


(Tin Le) #5

Certainly. The logstash config is very simple and is listed in the bug report

I bring up the 5 node cluster fresh, e.g. no data. Note start time when I start up logstash and let it run for several hours. Performance testing of this cluster and logstash, plus my golang program is all I have been doing since Friday morning and all weekend.... Yeah, it has been a fun weekend for me :smile:

The mappings is exactly the same each time. The only difference is ES v1.4.5 and ES v1.7.1

In doing this performance test, I am keeping everything the same as much as possible, changing just one thing. ES version.

More information and link to the source code for my little go copy/reindexer is here:


(Clinton Gormley) #6

Two changes between 1.4 and 1.7 occur to me:

Bloom filters were maintained in memory to check which segment(s) might contain a document with a particular ID. This was fast, but used up a lot of memory, and the more open indices, the more memory was used.

Instead, we now do a term lookup which is usually almost as fast as the bloom filter. However... the performance of this lookup depends heavily on how many segments you have. If merges are not keeping up then you end up with more segments, which makes these term lookups slower.

What disks are you using? What throttling / merge settings do you have? Are you seeing any throttling in the stats? How many segments are you seeing on disk during the indexing process?


(Tin Le) #7

I am using Fusion-IO 2.2TB SSD drives (5 nodes x 2.2TB SSD each).

One issue is that I am read throughput limited (single reader). Beside that though, it is consistent that 1.7.1 is slower indexing than 1.4.5. I think the two changes you listed are possible suspects. I will re-setup the tests to use my own ID generator and run them again when I get a chance. I am working on fixing the read throughput side at the moment.

This is the settings I am using for the test cluster. Currently it's purely indexing, there is no querying, no client access. I just want to see how fast I can reindex from another cluster to this one.

Indexing Settings

index.refresh_interval: -1

SSD

index.store.throttle.type: none
index.store.throttle.max_bytes_per_sec: 700mb
index.translog.flush_threshold_size: 1gb
indices.memory.index_buffer_size: 512mb

Attached is a graph of the segments in the cluster over last 12hrs.


(Clinton Gormley) #8

The autogenerated ID optimization was removed in 1.4.3, so it shouldn't come into play. What are your node and index stats showing for throttling?


(Tin Le) #9

No throttle last 12hrs. I/O wise, this cluster is sleeping :smile:


(Michael McCandless) #10

Bloom filters were already disabled (still indexed, but not used/loaded) by
default as of 1.4.0, so that shouldn't explain the change, unless in your
1.4.5 test you had enabled them?

Those segment counts look healthy (merges are keeping up), and that's good
that there's no indexing nor store throttling.

How many concurrent threads are indexing, and how many docs per bulk
request?

If you change to refresh_interval: 10s for both 1.4.5 and 1.7.1 does it
change the result? This may increase effective concurrency of the test
since a separate thread is writing segments.

If you drop index.number_of_replicas to 0, does that change the result?

We track indexing performance nightly at
https://benchmarks.elastic.co/index.html ... it's somewhat noisy, and it
tracks master (not 1.x releases), but performance has not changed that much
for the "fast" settings over time.

Mike McCandless


(Tin Le) #11

Which stats indicate concurrent threads that are indexing? I have tried both 500 and 1000 docs per bulk request. I did not see much difference. The doc size average around 4KB.

Replicas have been set to 0 from the beginning. I don't use replicas for active index.

I can test changing refresh interval and see what I get. Usually it's set to off (-1) or 90s for active indices.


(Michael McCandless) #12

I think (not certain) for logstash it's the number of workers you specify


(Michael McCandless) #13

Hmm my response (via email) was truncated for some reason ... trying again:

I think (not certain) for logstash it's the number of workers you specify for the Elasticsearch output? I think the default is 1: https://www.elastic.co/guide/en/logstash/current/plugins-outputs-elasticsearch.html

Or in your custom Go client, you would control how many threads (goroutines?) are sending bulk indexing requests concurrently.

You can ask for node stats, then look under thread_pool -> bulk -> active to see how many bulk threads are currently working, but this isn't a precise measure since when you check, it could be your client side threads had just received results and are working to prepare the next request and so they are not counted.

You want to have enough client side concurrency so the server side resources are saturated (either CPU or IO or, if you're lucky, both!).


Horizontal scaling of indexing
(Tin Le) #14

I didn't have time to work on this today, but I did tried one thing. Increasing the bulk request size from 1000 to 5000 and then to 10000, both at 5K and 10K requests, I see an increase in indexing speed (higher CPU which is what I expected, no throttling). The speed increase is around 2.8%. I am leaving it running at 10K bulk size till later tonight when I have more time to look into it.


(system) #15