Indexing speed slowdown


(Phil) #1

I've noticed a slowdown in indexing speed going from esv 2.3 to 5.5.0.
It's early days, I have no precise figures and there are a tonne of other things that may differ, but my first thought was that it may be a change I made to doc_values- there are more doc_values now. I presume this will slow indexing speed, but I wonder by how much. I'm seeing a doubling of the time at least to index my corpus of text. Any suggestions ?

Thanks,
Phil


(Christian Dahlqvist) #2

How large are your documents? How large bulk requests are you using? How many shards are you actively indexing into?


(Phil) #3

About 300 characters on average of text.
Rest is unchanged from my previous settings in ES 2.3, which was:

"number_of_replicas": 0,
"refresh_interval": "120s",
"store.throttle.max_bytes_per_sec" : "40mb",
"index.translog.flush_threshold_size":"1g"


(Christian Dahlqvist) #4

What is the size of the bulk index requests you are sending to Elasticsearch (assuming you are using the bulk API)?


(Phil) #5

These are the parameters (again, unchanged)

.setBulkSize(
new ByteSizeValue(20, ByteSizeUnit.MB))
.setConcurrentRequests(calculateBulkLoadThreads() - 1)
.build();


(Phil) #6

And I observe a degradation over time. I start off getting tens of thousands of docs loaded per minute and after a while this rate drops ten fold. Stopping and restarting doesn't help and I suspect it's more to do with random IO or seek times as the index grows.


(Christian Dahlqvist) #7

Are you indexing documents with a custom document ID or letting Elasticsearch assign the id? Do you update your documents?


(Phil) #8

We use a custom ID. format:

my-source___*_1234_1

where '12341678' is an integer and changes but the other stuff remains constant (in this specific example load).

We don't update, this was a fresh load to a virgin index.


(Christian Dahlqvist) #9

If you do not need to update documents or use the id to ensure you don't get duplicates, letting Elasticsearch assign the document id will give better performance as Elasticsearch knows the documents do not already exist. You can read about the impact of id formats on indexing speed in this blog post.


(Phil) #10

That's just it- we need the ID to ensure we don't get duplicates


(Phil) #11

and the thing is, it's the differential between index speed on 2.3 and 5.5.0 that interests me. The ID style hasn't changed. But, if there's nothing crazy about my process, that's cool, I can go investigate more deeply.


(Phil) #12

bah, its nothing to do with ES. Sorry to have wasted your time. Found a different cause for slowness. Face-palm.


(system) #13

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.