INDEX Performance


We have been trying to move from solr to elastic search and want to compare the performance for indexing 100M records from database.
Currently it takes 4 hours to index 52 Million records

Current confiuguration:
default shards = 5, and also we have increased refresh time interval to 30s.
Whats the best way to increase the performance , I am planning to increase shards to 7.

(Christian Dahlqvist) #2

Increasing the number of shards will not necessarily improve performance. It could actually do the opposite.

Have you gone through these guidelines? How many indexing threads are you using? What bulk size are you using?


We are currently used 9 logstash threads to index the data, and i have set refresh interval to 30s .

(Christian Dahlqvist) #4

What is the specification of your Elasticsearch cluster? Which version are you running? What is the average size of your documents?


the size of the VM is 4 cores , 24GB RAM .Cluster has one node by default 5 shards.
Added bootstrap.memory_lock: true by referring to the guide.

Version is 6.2.2
We are basically trying to index 50 M records from database with 32 columns.

(Christian Dahlqvist) #7

What type of storage do you have? What does disk I/O and iowait look like while you are indexing?


Attached my indexing stats , please help me understand what is wrong

(Christian Dahlqvist) #9

It looks like you have quite a few deleted documents. Do a large portion of the documents you load result in an update?


So we have changed configuration in logstash to remove duplicate documents but have mapped it by unique column and changed it from default _id to unique id from our database.
I dont know why we are seeing documents deleted

(Christian Dahlqvist) #11

An update would show up as a delete and an insert.


We are not updating the documents and instead changed the logstash confirguration to include unique id has document id .Attaching my configuration , can you please let me if i need to change something ?


(Christian Dahlqvist) #14

Do you have any details about the underlying storage?


No , i dont have any details about that.

(Christian Dahlqvist) #16

The type and performance of storage is often a limiting factor for Elasticsearch. Check what ‘iostat -x’ gives while indexing.

(system) #17

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.