Sharding Function

Roman_Kournjaev · November 25, 2012, 8:59pm

Hi

I am trying to optimize my indexing process ( in terms of time ).
I am running a 8 shard cluster ( no replicas ) on 4 nodes. ( 4 physical
machines ).

I am indexing some set of 30M products , and I want actually to opimize the
process of idexing that i send a specific subset of products to the exact
node where it is going to be stored. Whats the sharding function elastic is
using , and can i actually change it ?

If yes , is that actually going to speed up my indexing process ?

Thanks
Roman

--

Igor_Motov · November 25, 2012, 9:45pm

First of all, you can simply use NodeClient and it will do this for you.
But if you feel really adventurous you can replace DjbHashFunctionhttps://github.com/elasticsearch/elasticsearch/blob/master/src/main/java/org/elasticsearch/cluster/routing/operation/hash/djb/DjbHashFunction.javathat ES is using by implementing
HashFunctionhttps://github.com/elasticsearch/elasticsearch/blob/master/src/main/java/org/elasticsearch/cluster/routing/operation/hash/HashFunction.javainterface and specifying your function's class name in the
cluster.routing.operation.hash.type setting. It might speed up your
indexing process if it's network bound. However, that would be rather a
non-typical situation since in most cases indexing is CPU or disk I/O bound
activity. If you have more than 2 processor cores on your nodes and
indexing is not disk I/O bound, I would start with increasing the number of
shards first.

On Sunday, November 25, 2012 3:59:15 PM UTC-5, Roman Kournjaev wrote:

Hi

I am trying to optimize my indexing process ( in terms of time ).
I am running a 8 shard cluster ( no replicas ) on 4 nodes. ( 4 physical
machines ).

I am indexing some set of 30M products , and I want actually to opimize
the process of idexing that i send a specific subset of products to the
exact node where it is going to be stored. Whats the sharding function
elastic is using , and can i actually change it ?

If yes , is that actually going to speed up my indexing process ?

Thanks
Roman

--

Roman_Kournjaev · November 25, 2012, 10:13pm

I am accessing ES through Nest (C#) , so i probably dont have access not to
the NodeClient. Can i provide this HashFunction through a http request ?

On Sunday, November 25, 2012 11:45:52 PM UTC+2, Igor Motov wrote:

First of all, you can simply use NodeClient and it will do this for you.
But if you feel really adventurous you can replace DjbHashFunctionhttps://github.com/elasticsearch/elasticsearch/blob/master/src/main/java/org/elasticsearch/cluster/routing/operation/hash/djb/DjbHashFunction.javathat ES is using by implementing
HashFunctionhttps://github.com/elasticsearch/elasticsearch/blob/master/src/main/java/org/elasticsearch/cluster/routing/operation/hash/HashFunction.javainterface and specifying your function's class name in the
cluster.routing.operation.hash.type setting. It might speed up your
indexing process if it's network bound. However, that would be rather a
non-typical situation since in most cases indexing is CPU or disk I/O bound
activity. If you have more than 2 processor cores on your nodes and
indexing is not disk I/O bound, I would start with increasing the number of
shards first.

On Sunday, November 25, 2012 3:59:15 PM UTC-5, Roman Kournjaev wrote:

Hi

I am trying to optimize my indexing process ( in terms of time ).
I am running a 8 shard cluster ( no replicas ) on 4 nodes. ( 4 physical
machines ).

I am indexing some set of 30M products , and I want actually to opimize
the process of idexing that i send a specific subset of products to the
exact node where it is going to be stored. Whats the sharding function
elastic is using , and can i actually change it ?

If yes , is that actually going to speed up my indexing process ?

Thanks
Roman

--

Igor_Motov · November 25, 2012, 10:26pm

No, the function should be implemented in form of elasticsearch plugin.

On Sunday, November 25, 2012 5:13:01 PM UTC-5, Roman Kournjaev wrote:

I am accessing ES through Nest (C#) , so i probably dont have access not
to the NodeClient. Can i provide this HashFunction through a http request ?

On Sunday, November 25, 2012 11:45:52 PM UTC+2, Igor Motov wrote:

First of all, you can simply use NodeClient and it will do this for you.
But if you feel really adventurous you can replace DjbHashFunctionhttps://github.com/elasticsearch/elasticsearch/blob/master/src/main/java/org/elasticsearch/cluster/routing/operation/hash/djb/DjbHashFunction.javathat ES is using by implementing
HashFunctionhttps://github.com/elasticsearch/elasticsearch/blob/master/src/main/java/org/elasticsearch/cluster/routing/operation/hash/HashFunction.javainterface and specifying your function's class name in the
cluster.routing.operation.hash.type setting. It might speed up your
indexing process if it's network bound. However, that would be rather a
non-typical situation since in most cases indexing is CPU or disk I/O bound
activity. If you have more than 2 processor cores on your nodes and
indexing is not disk I/O bound, I would start with increasing the number of
shards first.

On Sunday, November 25, 2012 3:59:15 PM UTC-5, Roman Kournjaev wrote:

Hi

I am trying to optimize my indexing process ( in terms of time ).
I am running a 8 shard cluster ( no replicas ) on 4 nodes. ( 4 physical
machines ).

I am indexing some set of 30M products , and I want actually to opimize
the process of idexing that i send a specific subset of products to the
exact node where it is going to be stored. Whats the sharding function
elastic is using , and can i actually change it ?

If yes , is that actually going to speed up my indexing process ?

Thanks
Roman

--

Topic		Replies	Views
Documents not getting sharded evenly Elasticsearch	14	1692	July 5, 2017
Shards/routing documents imbalance problem Elasticsearch	9	786	July 6, 2017
Offline indexing and expected scaling performance Elasticsearch	4	1842	July 6, 2017
ElasticSearch Hash Function Elasticsearch	2	3754	July 6, 2017
Fair distribution of shards per node per index Elasticsearch	7	424	July 6, 2017

Sharding Function

Related topics