Intial load and Cluster configuration query

lalit_mishra · May 19, 2011, 5:32pm

Hi All,
I have below queries pertaining to elastic search

In our application we would like to index around 2-3 million documents
what should be the best cluster/Shard configuration ?
Our initial load consists of indexing 2-3 million records what are
configuration we should make in elastic search so that we achieve faster
indexing for initial load ?
Currenlty we have a batch process which spawns 10 threads this threads
sends request over to elastic search server? Do we forsee any issue with
this approach (like write lock)?

Thanks for your help.

Regards,
Lalit.

kimchy · May 19, 2011, 8:14pm

On Thursday, May 19, 2011 at 8:32 PM, lalit mishra wrote:
Hi All,

I have below queries pertaining to Elasticsearch

In our application we would like to index around 2-3 million documents what should be the best cluster/Shard configuration ?

Hard to tell, since I don't know the size of the documents. You will need to do some capacity planning. Generally, the default 5 shards should be more than enough for 2-3 million, and depending on the docs, you can even use lower number of shards (will use less memory).

Our initial load consists of indexing 2-3 million records what are configuration we should make in Elasticsearch so that we achieve faster indexing for initial load ?

Indexing rate really depends on many factors. As for scaling out, the default 5 shards set for an index will mean you can grow upto 5 machines. If oyu have 1 replica (the default), then you can grow up to 10 machines without hitting a wall.

Currenlty we have a batch process which spawns 10 threads this threads sends request over to Elasticsearch server? Do we forsee any issue with this approach (like write lock)?

No, no issues here. Check that your indexing machine is not the bottleneck.

Thanks for your help.

Regards,
Lalit.

Karussell1 · May 20, 2011, 7:46pm

Our initial load consists of indexing 2-3 million records what are
configuration we should make in Elasticsearch so that we achieve faster
indexing for initial load ?

hopefully I'm not wrong with this, but you can increase the real time
latency (refresh_interval) which improves indexing. @Shay: Is that a
correct assumption?

Also tuning lucene's merge factor (increase it) can improve indexing
speed (but you pay querying time).

kimchy · May 20, 2011, 7:52pm

You can change those in real time, check the update settings API which has a sample for it: Elasticsearch Platform — Find real-time answers at scale | Elastic.
On Friday, May 20, 2011 at 10:46 PM, Karussell wrote:

Our initial load consists of indexing 2-3 million records what are

configuration we should make in Elasticsearch so that we achieve faster
indexing for initial load ?

hopefully I'm not wrong with this, but you can increase the real time
latency (refresh_interval) which improves indexing. @Shay: Is that a
correct assumption?

Also tuning lucene's merge factor (increase it) can improve indexing
speed (but you pay querying time).

lalit_mishra · May 21, 2011, 1:41am

Hi Shay,
So I need to use below configuration during initial load of the records

curl -XPUT localhost:9200/test/_settings -d '{
"index" : {
"refresh_interval" : "-1",
"merge.policy.merge_factor" : 30
}
}'

Once the initial load is over then revert the indices settings to

curl -XPUT localhost:9200/test/_settings -d '{
"index" : {
"refresh_interval" : "1s",
"merge.policy.merge_factor" : 10
}
}'

curl -XPOST 'http://localhost:9200/test/_optimize?max_num_segments=5'

Hope this should further speed up the process.

Thanks,
Lalit.

On Sat, May 21, 2011 at 1:22 AM, Shay Banon shay.banon@elasticsearch.comwrote:

You can change those in real time, check the update settings API which
has a sample for it:
Elasticsearch Platform — Find real-time answers at scale | Elastic
.

On Friday, May 20, 2011 at 10:46 PM, Karussell wrote:

Our initial load consists of indexing 2-3 million records what are
configuration we should make in Elasticsearch so that we achieve faster
indexing for initial load ?

hopefully I'm not wrong with this, but you can increase the real time
latency (refresh_interval) which improves indexing. @Shay: Is that a
correct assumption?

Also tuning lucene's merge factor (increase it) can improve indexing
speed (but you pay querying time).

Topic		Replies	Views
Design indexes with big data Elasticsearch	16	2203	July 31, 2020
Is there a preferred config for Index / Shard configuration? Lots of indexes with lots of shards or fewer indexes and bigger shards? Elasticsearch	3	669	July 6, 2017
Advice on cluster configuration Elasticsearch	10	553	January 8, 2019
New User -- Index Settings Reccomdendations and Suggestions Elasticsearch	8	442	July 6, 2017
Optimal Shard Strategy for High Search Load with Elasticsearch Cluster Elasticsearch	8	146	July 2, 2024

Intial load and Cluster configuration query

Related topics