Sharding Function

Hi

I am trying to optimize my indexing process ( in terms of time ).
I am running a 8 shard cluster ( no replicas ) on 4 nodes. ( 4 physical
machines ).

I am indexing some set of 30M products , and I want actually to opimize the
process of idexing that i send a specific subset of products to the exact
node where it is going to be stored. Whats the sharding function elastic is
using , and can i actually change it ?

If yes , is that actually going to speed up my indexing process ?

Thanks
Roman

--

First of all, you can simply use NodeClient and it will do this for you.
But if you feel really adventurous :slight_smile: you can replace DjbHashFunctionhttps://github.com/elasticsearch/elasticsearch/blob/master/src/main/java/org/elasticsearch/cluster/routing/operation/hash/djb/DjbHashFunction.javathat ES is using by implementing
HashFunctionhttps://github.com/elasticsearch/elasticsearch/blob/master/src/main/java/org/elasticsearch/cluster/routing/operation/hash/HashFunction.javainterface and specifying your function's class name in the
cluster.routing.operation.hash.type setting. It might speed up your
indexing process if it's network bound. However, that would be rather a
non-typical situation since in most cases indexing is CPU or disk I/O bound
activity. If you have more than 2 processor cores on your nodes and
indexing is not disk I/O bound, I would start with increasing the number of
shards first.

On Sunday, November 25, 2012 3:59:15 PM UTC-5, Roman Kournjaev wrote:

Hi

I am trying to optimize my indexing process ( in terms of time ).
I am running a 8 shard cluster ( no replicas ) on 4 nodes. ( 4 physical
machines ).

I am indexing some set of 30M products , and I want actually to opimize
the process of idexing that i send a specific subset of products to the
exact node where it is going to be stored. Whats the sharding function
elastic is using , and can i actually change it ?

If yes , is that actually going to speed up my indexing process ?

Thanks
Roman

--

I am accessing ES through Nest (C#) , so i probably dont have access not to
the NodeClient. Can i provide this HashFunction through a http request ?

On Sunday, November 25, 2012 11:45:52 PM UTC+2, Igor Motov wrote:

First of all, you can simply use NodeClient and it will do this for you.
But if you feel really adventurous :slight_smile: you can replace DjbHashFunctionhttps://github.com/elasticsearch/elasticsearch/blob/master/src/main/java/org/elasticsearch/cluster/routing/operation/hash/djb/DjbHashFunction.javathat ES is using by implementing
HashFunctionhttps://github.com/elasticsearch/elasticsearch/blob/master/src/main/java/org/elasticsearch/cluster/routing/operation/hash/HashFunction.javainterface and specifying your function's class name in the
cluster.routing.operation.hash.type setting. It might speed up your
indexing process if it's network bound. However, that would be rather a
non-typical situation since in most cases indexing is CPU or disk I/O bound
activity. If you have more than 2 processor cores on your nodes and
indexing is not disk I/O bound, I would start with increasing the number of
shards first.

On Sunday, November 25, 2012 3:59:15 PM UTC-5, Roman Kournjaev wrote:

Hi

I am trying to optimize my indexing process ( in terms of time ).
I am running a 8 shard cluster ( no replicas ) on 4 nodes. ( 4 physical
machines ).

I am indexing some set of 30M products , and I want actually to opimize
the process of idexing that i send a specific subset of products to the
exact node where it is going to be stored. Whats the sharding function
elastic is using , and can i actually change it ?

If yes , is that actually going to speed up my indexing process ?

Thanks
Roman

--

No, the function should be implemented in form of elasticsearch plugin.

On Sunday, November 25, 2012 5:13:01 PM UTC-5, Roman Kournjaev wrote:

I am accessing ES through Nest (C#) , so i probably dont have access not
to the NodeClient. Can i provide this HashFunction through a http request ?

On Sunday, November 25, 2012 11:45:52 PM UTC+2, Igor Motov wrote:

First of all, you can simply use NodeClient and it will do this for you.
But if you feel really adventurous :slight_smile: you can replace DjbHashFunctionhttps://github.com/elasticsearch/elasticsearch/blob/master/src/main/java/org/elasticsearch/cluster/routing/operation/hash/djb/DjbHashFunction.javathat ES is using by implementing
HashFunctionhttps://github.com/elasticsearch/elasticsearch/blob/master/src/main/java/org/elasticsearch/cluster/routing/operation/hash/HashFunction.javainterface and specifying your function's class name in the
cluster.routing.operation.hash.type setting. It might speed up your
indexing process if it's network bound. However, that would be rather a
non-typical situation since in most cases indexing is CPU or disk I/O bound
activity. If you have more than 2 processor cores on your nodes and
indexing is not disk I/O bound, I would start with increasing the number of
shards first.

On Sunday, November 25, 2012 3:59:15 PM UTC-5, Roman Kournjaev wrote:

Hi

I am trying to optimize my indexing process ( in terms of time ).
I am running a 8 shard cluster ( no replicas ) on 4 nodes. ( 4 physical
machines ).

I am indexing some set of 30M products , and I want actually to opimize
the process of idexing that i send a specific subset of products to the
exact node where it is going to be stored. Whats the sharding function
elastic is using , and can i actually change it ?

If yes , is that actually going to speed up my indexing process ?

Thanks
Roman

--