We are trying to index documents using logstash 2.4.0 to elasticsearch 2.4.0. Currently we are facing performance issue on indexing. 700 MB of data is taking around 30 minutes of time.
We have a 10 node cluster ( 3 D, 3 M, 2 Cl)(Azure VMs) of 56 GB memory(28GB JVM) and 8 cores. Our index is having 3 replicas, and having size of 2 TB.
we have index refresh interval of 1 hour. Also, elastic search config, looks like below. Please provide us some pointers, if we are missing any settings, which actually help indexing.
Note: We enabled memory lock in windows, but that didn't helped, even though swap memory is being used. as we can see that is being used in elasticHQ plugin.
Are you indexing new documents and/or updating existing ones? How large are your bulk requests? How many concurrent indexing threads/processes are you using? Are you allowing Elasticsearch to set the document ID or do you handle this in your application?
@JKhondhu 20 Primary Shards, we have set the refresh interval to 1 hour, as we heard, it will degrade the indexing performance, if we have refresh interval set to 1 sec(default).
As you are explicitly setting the document id instead of letting Elasticsearch assign it, each index operation is treated as an update as Elasticsearch must check if the document already exists. Depending on how you create this identifier it can have a significant impact on indexing performance. If chosen poorly, like e.g. a random UUID, indexing performance will degrade as the shards grow in size and more segments need to be checked for every document indexed.
As you have immutable documents you will probably benefit from switching to time-based indices as this makes it easier to control the shard size and prevent it from continuously increasing over time. It also makes managing retention of data much easier and efficient as whole indices can simply be dropped rather than having to delete individual documents.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.