Shard size is not sharing equally

Hi,

I' m using 14.3 version of Elastic Search. I have configured two data
nodes in separate servers, one non data node & one batch indexer in
separate servers. When i import the huge data (around 16 GB) in
Elastic Search, (actually the data should be shared into two parts
while copying in servers)but here in my case, the data node 1 server
consumes the 80% size in it and data node 2 consumes only less volume.
whereas when i was trying to import less data below 6 GB, both the
data nodes shares equally and it consumes equal space in both the
servers. And in optimization part, it should increase only two times
of original size, but here it takes 5 to 7 times.so automatically am
in situation of cleaning free space of my server if it is there. Else
i need to scale up the HDD size.

  1. Will the problem be in import size(giving huge data)? if so, how do
    i fix this issue?
  2. Do i need to setup any settings for sharing equally in both data
    node servers?
  3. How do i configure the Optimization option for taking only two
    times of size while optimizing?

Please help me out.

Thanks,
Meenakshi

If you have two nodes, then the same amount of shards should be on both servers (assuming you have 1 replica). So, the varying index size comes from different "state" the shard indices exists at and in when merging of internal segments kicked in.
On Wednesday, February 23, 2011 at 8:47 PM, Meenakshi wrote:

Hi,

I' m using 14.3 version of Elastic Search. I have configured two data
nodes in separate servers, one non data node & one batch indexer in
separate servers. When i import the huge data (around 16 GB) in
Elastic Search, (actually the data should be shared into two parts
while copying in servers)but here in my case, the data node 1 server
consumes the 80% size in it and data node 2 consumes only less volume.
whereas when i was trying to import less data below 6 GB, both the
data nodes shares equally and it consumes equal space in both the
servers. And in optimization part, it should increase only two times
of original size, but here it takes 5 to 7 times.so automatically am
in situation of cleaning free space of my server if it is there. Else
i need to scale up the HDD size.

  1. Will the problem be in import size(giving huge data)? if so, how do
    i fix this issue?
  2. Do i need to setup any settings for sharing equally in both data
    node servers?
  3. How do i configure the Optimization option for taking only two
    times of size while optimizing?

Please help me out.

Thanks,
Meenakshi

Hi,

I am not using any replica on this environment. Instead of this, i
rely on my gateway which is configured in one of my server. In this
scenario, how do i get the shards be shared equally? Say for Example:
if i have 160 GB data in gateway, then it should be shared into two
shards as 80 GB + 80 GB. But here, i got 0 Kb size in one shard and
160 GB in another shard. its not sharing even i delete those two
shards from servers, still it shares as i mentioned above.

I found one more scenario, details are as follows..

The same 160 GB data is shared into two shards, initially as 80 GB
each, but after sometime, one shard size gets reduced and the another
shard gets 165 GB.

(Version I'm using is 14.3)

On Feb 25, 6:23 am, Shay Banon shay.ba...@elasticsearch.com wrote:

If you have two nodes, then the same amount of shards should be on both servers (assuming you have 1 replica). So, the varying index size comes from different "state" the shard indices exists at and in when merging of internal segments kicked in.

On Wednesday, February 23, 2011 at 8:47 PM, Meenakshi wrote:

Hi,

I' m using 14.3 version of Elastic Search. I have configured two data
nodes in separate servers, one non data node & one batch indexer in
separate servers. When i import the huge data (around 16 GB) in
Elastic Search, (actually the data should be shared into two parts
while copying in servers)but here in my case, the data node 1 server
consumes the 80% size in it and data node 2 consumes only less volume.
whereas when i was trying to import less data below 6 GB, both the
data nodes shares equally and it consumes equal space in both the
servers. And in optimization part, it should increase only two times
of original size, but here it takes 5 to 7 times.so automatically am
in situation of cleaning free space of my server if it is there. Else
i need to scale up the HDD size.

  1. Will the problem be in import size(giving huge data)? if so, how do
    i fix this issue?
  2. Do i need to setup any settings for sharing equally in both data
    node servers?
  3. How do i configure the Optimization option for taking only two
    times of size while optimizing?

Please help me out.

Thanks,
Meenakshi

You should not get that. Make sure the nodes discover each other.
On Saturday, February 26, 2011 at 12:09 AM, Jagmee wrote:

Hi,

I am not using any replica on this environment. Instead of this, i
rely on my gateway which is configured in one of my server. In this
scenario, how do i get the shards be shared equally? Say for Example:
if i have 160 GB data in gateway, then it should be shared into two
shards as 80 GB + 80 GB. But here, i got 0 Kb size in one shard and
160 GB in another shard. its not sharing even i delete those two
shards from servers, still it shares as i mentioned above.

I found one more scenario, details are as follows..

The same 160 GB data is shared into two shards, initially as 80 GB
each, but after sometime, one shard size gets reduced and the another
shard gets 165 GB.

(Version I'm using is 14.3)

On Feb 25, 6:23 am, Shay Banon shay.ba...@elasticsearch.com wrote:

If you have two nodes, then the same amount of shards should be on both servers (assuming you have 1 replica). So, the varying index size comes from different "state" the shard indices exists at and in when merging of internal segments kicked in.

On Wednesday, February 23, 2011 at 8:47 PM, Meenakshi wrote:

Hi,

I' m using 14.3 version of Elastic Search. I have configured two data
nodes in separate servers, one non data node & one batch indexer in
separate servers. When i import the huge data (around 16 GB) in
Elastic Search, (actually the data should be shared into two parts
while copying in servers)but here in my case, the data node 1 server
consumes the 80% size in it and data node 2 consumes only less volume.
whereas when i was trying to import less data below 6 GB, both the
data nodes shares equally and it consumes equal space in both the
servers. And in optimization part, it should increase only two times
of original size, but here it takes 5 to 7 times.so automatically am
in situation of cleaning free space of my server if it is there. Else
i need to scale up the HDD size.

  1. Will the problem be in import size(giving huge data)? if so, how do
    i fix this issue?
  2. Do i need to setup any settings for sharing equally in both data
    node servers?
  3. How do i configure the Optimization option for taking only two
    times of size while optimizing?

Please help me out.

Thanks,
Meenakshi

Both the nodes discover each other. still it doesn't share equally.
What will be the problem?

--Jagmee

On Feb 27, 3:27 am, Shay Banon shay.ba...@elasticsearch.com wrote:

You should not get that. Make sure the nodes discover each other.

On Saturday, February 26, 2011 at 12:09 AM, Jagmee wrote:

Hi,

I am not using any replica on this environment. Instead of this, i
rely on my gateway which is configured in one of my server. In this
scenario, how do i get the shards be shared equally? Say for Example:
if i have 160 GB data in gateway, then it should be shared into two
shards as 80 GB + 80 GB. But here, i got 0 Kb size in one shard and
160 GB in another shard. its not sharing even i delete those two
shards from servers, still it shares as i mentioned above.

I found one more scenario, details are as follows..

The same 160 GB data is shared into two shards, initially as 80 GB
each, but after sometime, one shard size gets reduced and the another
shard gets 165 GB.

(Version I'm using is 14.3)

On Feb 25, 6:23 am, Shay Banon shay.ba...@elasticsearch.com wrote:

If you have two nodes, then the same amount of shards should be on both servers (assuming you have 1 replica). So, the varying index size comes from different "state" the shard indices exists at and in when merging of internal segments kicked in.

On Wednesday, February 23, 2011 at 8:47 PM, Meenakshi wrote:

Hi,

I' m using 14.3 version of Elastic Search. I have configured two data
nodes in separate servers, one non data node & one batch indexer in
separate servers. When i import the huge data (around 16 GB) in
Elastic Search, (actually the data should be shared into two parts
while copying in servers)but here in my case, the data node 1 server
consumes the 80% size in it and data node 2 consumes only less volume.
whereas when i was trying to import less data below 6 GB, both the
data nodes shares equally and it consumes equal space in both the
servers. And in optimization part, it should increase only two times
of original size, but here it takes 5 to 7 times.so automatically am
in situation of cleaning free space of my server if it is there. Else
i need to scale up the HDD size.

  1. Will the problem be in import size(giving huge data)? if so, how do
    i fix this issue?
  2. Do i need to setup any settings for sharing equally in both data
    node servers?
  3. How do i configure the Optimization option for taking only two
    times of size while optimizing?

Please help me out.

Thanks,
Meenakshi