Hi All,
I have below queries pertaining to elastic search
In our application we would like to index around 2-3 million documents
what should be the best cluster/Shard configuration ?
Our initial load consists of indexing 2-3 million records what are
configuration we should make in elastic search so that we achieve faster
indexing for initial load ?
Currenlty we have a batch process which spawns 10 threads this threads
sends request over to elastic search server? Do we forsee any issue with
this approach (like write lock)?
On Thursday, May 19, 2011 at 8:32 PM, lalit mishra wrote:
Hi All,
I have below queries pertaining to Elasticsearch
In our application we would like to index around 2-3 million documents what should be the best cluster/Shard configuration ?
Hard to tell, since I don't know the size of the documents. You will need to do some capacity planning. Generally, the default 5 shards should be more than enough for 2-3 million, and depending on the docs, you can even use lower number of shards (will use less memory).
Our initial load consists of indexing 2-3 million records what are configuration we should make in Elasticsearch so that we achieve faster indexing for initial load ?
Indexing rate really depends on many factors. As for scaling out, the default 5 shards set for an index will mean you can grow upto 5 machines. If oyu have 1 replica (the default), then you can grow up to 10 machines without hitting a wall.
Currenlty we have a batch process which spawns 10 threads this threads sends request over to Elasticsearch server? Do we forsee any issue with this approach (like write lock)?
No, no issues here. Check that your indexing machine is not the bottleneck.
Our initial load consists of indexing 2-3 million records what are
configuration we should make in Elasticsearch so that we achieve faster
indexing for initial load ?
hopefully I'm not wrong with this, but you can increase the real time
latency (refresh_interval) which improves indexing. @Shay: Is that a
correct assumption?
Also tuning lucene's merge factor (increase it) can improve indexing
speed (but you pay querying time).
Our initial load consists of indexing 2-3 million records what are
configuration we should make in Elasticsearch so that we achieve faster
indexing for initial load ?
hopefully I'm not wrong with this, but you can increase the real time
latency (refresh_interval) which improves indexing. @Shay: Is that a
correct assumption?
Also tuning lucene's merge factor (increase it) can improve indexing
speed (but you pay querying time).
On Friday, May 20, 2011 at 10:46 PM, Karussell wrote:
Our initial load consists of indexing 2-3 million records what are
configuration we should make in Elasticsearch so that we achieve faster
indexing for initial load ?
hopefully I'm not wrong with this, but you can increase the real time
latency (refresh_interval) which improves indexing. @Shay: Is that a
correct assumption?
Also tuning lucene's merge factor (increase it) can improve indexing
speed (but you pay querying time).
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.