We have been trying to move from solr to elastic search and want to compare the performance for indexing 100M records from database.
Currently it takes 4 hours to index 52 Million records
Current confiuguration:
default shards = 5, and also we have increased refresh time interval to 30s.
Whats the best way to increase the performance , I am planning to increase shards to 7.
So we have changed configuration in logstash to remove duplicate documents but have mapped it by unique column and changed it from default _id to unique id from our database.
I dont know why we are seeing documents deleted
We are not updating the documents and instead changed the logstash confirguration to include unique id has document id .Attaching my configuration , can you please let me if i need to change something ?
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.