Shard check on startup is slow


(vpunski) #1

I tried to use index.shard.check_on_startup in order to check the
consistency of shard.
For unknown reason, this process is very slow.
From short investigation of system resources, I see that CPU usage is
very low, but it doesn't seems to be IO intensive.
My discs read about 20MB/sec, when using "hdparam -t /dev/sda" it gets
about 100MB/sec.

What is the reason of so slow loading?

Current shard configuration:
{
"cluster_name" : "MY_CLUSTER",
"status" : "green",
"timed_out" : false,
"number_of_nodes" : 10,
"number_of_data_nodes" : 10,
"active_primary_shards" : 10,
"active_shards" : 30,
"relocating_shards" : 0,
"initializing_shards" : 0,
"unassigned_shards" : 0

}


(Shay Banon) #2

I mentioned before that this is a slow process. It needs to check and read
quite a lot of data from the index.

On Thu, Nov 3, 2011 at 11:01 AM, vadim vpunski@gmail.com wrote:

I tried to use index.shard.check_on_startup in order to check the
consistency of shard.
For unknown reason, this process is very slow.
From short investigation of system resources, I see that CPU usage is
very low, but it doesn't seems to be IO intensive.
My discs read about 20MB/sec, when using "hdparam -t /dev/sda" it gets
about 100MB/sec.

What is the reason of so slow loading?

Current shard configuration:
{
"cluster_name" : "MY_CLUSTER",
"status" : "green",
"timed_out" : false,
"number_of_nodes" : 10,
"number_of_data_nodes" : 10,
"active_primary_shards" : 10,
"active_shards" : 30,
"relocating_shards" : 0,
"initializing_shards" : 0,
"unassigned_shards" : 0

}


(vpunski) #3

Yes, you've mentioned it before. Now I'd like to understand the actual
reason of slow read rate (from IO perspective), not the "slow process". By
reason I mean for example non sequential file read requiered by Lucene
index check process. Anyway, I'd like to understand the real reason and try
to solve it.

From what I've seen in the code, as all the shards started in parallel,
their "check on startup" process creates race condition on hardware storage
(disc).

Am I missing something.

Thanks
On Nov 3, 2011 7:15 PM, "Shay Banon" kimchy@gmail.com wrote:

I mentioned before that this is a slow process. It needs to check and read
quite a lot of data from the index.

On Thu, Nov 3, 2011 at 11:01 AM, vadim vpunski@gmail.com wrote:

I tried to use index.shard.check_on_startup in order to check the
consistency of shard.
For unknown reason, this process is very slow.
From short investigation of system resources, I see that CPU usage is
very low, but it doesn't seems to be IO intensive.
My discs read about 20MB/sec, when using "hdparam -t /dev/sda" it gets
about 100MB/sec.

What is the reason of so slow loading?

Current shard configuration:
{
"cluster_name" : "MY_CLUSTER",
"status" : "green",
"timed_out" : false,
"number_of_nodes" : 10,
"number_of_data_nodes" : 10,
"active_primary_shards" : 10,
"active_shards" : 30,
"relocating_shards" : 0,
"initializing_shards" : 0,
"unassigned_shards" : 0

}


(Shay Banon) #4

You can have a look at the CheckIndex class in Lucene to see how its
implemented. Regarding the number of shards started in parallel on a node,
you can control that. By default, it will allow for 4 primary shards to be
started (cluster.routing.allocation.node_initial_primaries_recoveries
setting) in parallel on a node, and 2 replica shards
(cluster.routing.allocation.node_concurrent_recoveries setting).

On Fri, Nov 4, 2011 at 10:20 AM, Vadim Punski vpunski@gmail.com wrote:

Yes, you've mentioned it before. Now I'd like to understand the actual
reason of slow read rate (from IO perspective), not the "slow process". By
reason I mean for example non sequential file read requiered by Lucene
index check process. Anyway, I'd like to understand the real reason and try
to solve it.

From what I've seen in the code, as all the shards started in parallel,
their "check on startup" process creates race condition on hardware storage
(disc).

Am I missing something.

Thanks
On Nov 3, 2011 7:15 PM, "Shay Banon" kimchy@gmail.com wrote:

I mentioned before that this is a slow process. It needs to check and
read quite a lot of data from the index.

On Thu, Nov 3, 2011 at 11:01 AM, vadim vpunski@gmail.com wrote:

I tried to use index.shard.check_on_startup in order to check the
consistency of shard.
For unknown reason, this process is very slow.
From short investigation of system resources, I see that CPU usage is
very low, but it doesn't seems to be IO intensive.
My discs read about 20MB/sec, when using "hdparam -t /dev/sda" it gets
about 100MB/sec.

What is the reason of so slow loading?

Current shard configuration:
{
"cluster_name" : "MY_CLUSTER",
"status" : "green",
"timed_out" : false,
"number_of_nodes" : 10,
"number_of_data_nodes" : 10,
"active_primary_shards" : 10,
"active_shards" : 30,
"relocating_shards" : 0,
"initializing_shards" : 0,
"unassigned_shards" : 0

}


(system) #5