muthug
(muthuraam)
October 19, 2020, 8:53am
1
Hi Team,
initializing_shards got struck, if i delete those shards wat will happen? it will become Green and will work smoothly?
Kibana link is working for some time and its showing error timeout 30000ms some time.
{
"cluster_name" : "OConnectElasticSearch",
"status" : "red",
"timed_out" : false,
"number_of_nodes" : 9,
"number_of_data_nodes" : 6,
"active_primary_shards" : 20474,
"active_shards" : 40948,
"relocating_shards" : 0,
"initializing_shards" : 4,
"unassigned_shards" : 4,
"delayed_unassigned_shards" : 0,
"number_of_pending_tasks" : 0,
"number_of_in_flight_fetch" : 0,
"task_max_waiting_in_queue_millis" : 0,
"active_shards_percent_as_number" : 99.98046684246509
}
dadoonet
(David Pilato)
October 19, 2020, 9:05am
2
41000 shards on 6 nodes.... Which means around 6900 shards per node.
You probably have too many shards per node.
May I suggest you look at the following resources about sizing:
And Using Rally to Get Your Elasticsearch Cluster Size Right | Elastic Videos
1 Like
muthug
(muthuraam)
October 19, 2020, 9:19am
3
Hi David,
Greetings !!
Thanks for your reply. I got you but for work around i need to fix this, because it is production one.
And also i am working on ELK upgradation. So very soon i will create new PROD environment there i will implement your suggestion. Here i need to fix ASAP as i said its in Production.
If i delete that 4 initializing shards what could happen ?
dadoonet
(David Pilato)
October 19, 2020, 9:37am
4
What is the output of:
GET /
GET /_cat/indices?v
If some outputs are too big, please share them on gist.github.com and link them here.
It will stay RED. As shards will be missing.
Best guess: wait for the cluster to recover.
Or Delete the indices which are RED but you will be missing some data.
muthug
(muthuraam)
October 19, 2020, 10:05am
5
GET /
{
"name" : "OConnectManagementNode",
"cluster_name" : "OConnectElasticSearch",
"version" : {
"number" : "2.3.4",
"build_hash" : "e455fd0c13dceca8dbbdbb1665d068ae55dabe3f",
"build_timestamp" : "2016-06-30T11:24:31Z",
"build_snapshot" : false,
"lucene_version" : "5.5.0"
},
"tagline" : "You Know, for Search"
}
GET /_cat/indices?v
muthug
(muthuraam)
October 19, 2020, 10:12am
6
cat/shards |grep INITIALIZING
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- 0:00:08 --:--:-- 0
vdc_b_e029-2020.04.12 5 p INITIALIZING OConnectDataNode04
vdc_b_err-2019.10.03 1 p INITIALIZING OConnectDataNode06
vdc_b_e029-2019.04.10 4 p INITIALIZING OConnectDataNode04
vdc_b_err-2020.09.18 1 p INITIALIZING OConnectDataNode04
100 3959k 100 3959k 0 0 453k 0 0:00:08 0:00:08 --:--:-- 837k
Why would you have indices with less that 100 documents each and 6 primary shards and 6 replica shards. This is incredibly wasteful but explains why your cluster is so exceptionally oversharded. I would recommend you start addressing this ASAP (go to a single primary shard in your index templates, consolidate indices, switch from daily to e.g. monthly indices where the size is small) as it is just otherwise going to complicate the migration.
muthug
(muthuraam)
October 19, 2020, 10:30am
8
True!! I am working on that. Soon it will be addressed and moved to new ELK stack 7.9.
Here what is the workaround solution to fix this?
If i don't want those shards data, shall i go for delete ? after that it would be fine or still remains in RED?
Note;- Kibana link is working but not consistently. Most of the time getting 30000ms timeout error.
I want to Fix Initializing_shards and kibana timeout issue at-least for workaround
I would not be surprised if the Kibana timeout error originated from querying too many small shards.
muthug
(muthuraam)
October 19, 2020, 12:02pm
10
So, You mean this timeout error is not because of these Initializing_shards got struck, It because of too many small shards.
Correct me if i am wrong
It could be either. Querying large amounts of shards can be slow. Did you see timeouts before you has unallocated primary shards?
If you want to allocate the missing primary shards as empty shards you can use the cluster reroute API , but be aware you will lose the data in those shards.
muthug
(muthuraam)
October 19, 2020, 1:19pm
12
I have deleted the all 4 shards and the status turn into green. But Reallocating shards are showing 6 and looks it got struck
GET cluster/health?pretty
{
"cluster_name" : "OConnectElasticSearch",
"status" : "green",
"timed_out" : false,
"number_of_nodes" : 9,
"number_of_data_nodes" : 6,
"active_primary_shards" : 20472,
"active_shards" : 40944,
"relocating_shards" : 6,
"initializing_shards" : 0,
"unassigned_shards" : 0,
"delayed_unassigned_shards" : 0,
"number_of_pending_tasks" : 526,
"number_of_in_flight_fetch" : 0,
"task_max_waiting_in_queue_millis" : 549051,
"active_shards_percent_as_number" : 100.0
}
dadoonet
(David Pilato)
October 19, 2020, 1:50pm
13
You don't need to wait for an upgrade to address that problem.
From tomorrow, just start new indices with only one shard and one replica.
Also you have very old indices in your cluster. Like this one vdc_b_e063-2019.06.01
which has more than one year.
Do you still need those indices?
muthug
(muthuraam)
October 19, 2020, 1:54pm
14
I haves deleted those indices. but now relocating shards are got struck.
vdc_b_e061-2020.05.05 4 r RELOCATING 0 159b 0.46.xx.xx OConnectDataNode06 -> 10.46.xx.xx jmmTwCkxQ_mQbLLllcd_Cg OConnectDataNode05
vdc_b_e061-2020.05.05 3 r RELOCATING 0 160b 0.46.xx.xx OConnectDataNode04 -> 10.46.xx.xx jmmTwCkxQ_mQbLLllcd_Cg OConnectDataNode05
vdc_b_e061-2020.05.05 5 p RELOCATING 0 159b 0.46.xx.xx OConnectDataNode04 -> 10.46.xx.xx jmmTwCkxQ_mQbLLllcd_Cg OConnectDataNode05
vdc_b_e061-2020.05.05 1 r RELOCATING 0 160b 0.46.xx.xx OConnectDataNode04 -> 10.46.xx.xx jmmTwCkxQ_mQbLLllcd_Cg OConnectDataNode05
vdc_b_e061-2020.05.05 2 p RELOCATING 1 13.2kb 0.46.xx.xx OConnectDataNode03 -> 10.46.xx.xx jmmTwCkxQ_mQbLLllcd_Cg OConnectDataNode05
vdc_b_e061-2020.05.05 0 r RELOCATING 0 160b 0.46.xx.xx OConnectDataNode04 -> 10.46.xx.xx jmmTwCkxQ_mQbLLllcd_Cg OConnectDataNode05
muthug
(muthuraam)
October 19, 2020, 1:57pm
15
If i fix this relocating shard issue then i am good. Bcoz past 5 days i am working this issue today finally its become green but with relocating shads 6
dadoonet
(David Pilato)
October 19, 2020, 2:07pm
16
What is the full output of:
GET /_cat/nodes?v
GET /_cat/health?v
GET /_cat/indices?v
muthug
(muthuraam)
October 19, 2020, 2:30pm
17
dadoonet:
/_cat/health?v
GET /_cat/nodes?v
host ip heap.percent ram.percent load node.role master name
10.46.xx.xx 10.46.xx.xx 14 58 0.27 d m OConnectDataNode05
10.46.xx.xx 10.46.xx.xx 30 59 0.54 d m OConnectDataNode01
10.46.xx.xx 10.46.xx.xx 48 55 0.96 - * OConnectManagementNode
10.46.xx.xx 10.46.xx.xx 13 57 0.43 d m OConnectDataNode02
10.46.xx.xx 10.46.xx.xx 15 57 0.44 d m OConnectDataNode03
10.46.xx.xx 10.46.xx.xx 61 57 0.20 - - OConnectClientNode02
10.46.xx.xx 10.46.xx.xx 23 57 0.63 d - OConnectDataNode04
10.46.xx.xx 10.46.xx.xx 53 0 -1.00 - - OConnectClientNode01
10.46.xx.xx 10.46.xx.xx 59 57 0.54 d m OConnectDataNode06
health?v
{
"cluster_name" : "BPOConnectElasticSearch",
"status" : "green",
"timed_out" : false,
"number_of_nodes" : 9,
"number_of_data_nodes" : 6,
"active_primary_shards" : 20472,
"active_shards" : 40944,
"relocating_shards" : 6,
"initializing_shards" : 0,
"unassigned_shards" : 0,
"delayed_unassigned_shards" : 0,
"number_of_pending_tasks" : 2409,
"number_of_in_flight_fetch" : 0,
"task_max_waiting_in_queue_millis" : 2561610,
"active_shards_percent_as_number" : 100.0
}
GET /_cat/indices?v
dadoonet
(David Pilato)
October 19, 2020, 2:44pm
18
I can still see a lot of data from 2019.
So I don't think you did.
muthug
(muthuraam)
October 19, 2020, 2:47pm
19
As per client agreement we should maintain 2 years log. What are the shards are shown unassigned i have deleted those, not all 2019.
dadoonet
(David Pilato)
October 19, 2020, 3:09pm
20
How can I know that you did not follow the advices although you said you did?
Anyway, I deeply agree with @Christian_Dahlqvist 's advices:
Christian_Dahlqvist:
I would recommend you start addressing this ASAP (go to a single primary shard in your index templates, consolidate indices, switch from daily to e.g. monthly indices where the size is small) as it is just otherwise going to complicate the migration.
1 Like