Hi!
I have found myself with a problem that I don't know how to fix, and I don't know exactly why this happens.
What I have right now
Elasticsearch 6.4.1
Single node cluster
Indices: ~1000
Documents: ~500 000 000.
What happend
I was getting ready to perform an upgrade and all I did was to restart the server. At this point we had about 1.2 billion documents. After the server restarted, the cluster entered some kind of recovery-mode and performed what I believe to be (after some searching on Google) a checksum-check on every single document.. This progressed very slowly and after letting it sit for 7 full days, it had only reached about 700 000 000 documents. What was worse is that it seemed to slow down.
So I did another restart, same issue.
This is were the problem started. In an attempt to fix this issue, I changed some clustersettings to speed up the cluster upstart (like num of concurrent recoveries, bandwidth and such). However, one setting I changed caused and issue that I don't know how to fix (and I don't know what I did). To fix the slow cluster start I had to simply delete most of my document. No big deal.
What happens now is that every time a new index is created, it is created with replica-shards? I've never had them before.. And I only have 1 node at this time. So I don't know why is does this?
This means that every day when I get in to work, the cluster-status i Yellow and I need to perform the following command to fix the issue. At least, until the next time a new index is created..
Fix:
curl -XPUT localhost:9200/_settings -H 'Content-Type: application/json' -d'{ "index": { "number_of_replicas": 0 } }'
After I do that, the cluster is back to green. How do I change my cluster back to where it does not try to create replications-shards?
This is some relevant data from today, before performing the "fix":
curl -XGET 'http://localhost:9200/_cluster/health?pretty=true'
{
"cluster_name" : "elasticsearch",
"status" : "yellow",
"timed_out" : false,
"number_of_nodes" : 1,
"number_of_data_nodes" : 1,
"active_primary_shards" : 4875,
"active_shards" : 4875,
"relocating_shards" : 0,
"initializing_shards" : 0,
"unassigned_shards" : 45,
"delayed_unassigned_shards" : 0,
"number_of_pending_tasks" : 0,
"number_of_in_flight_fetch" : 0,
"task_max_waiting_in_queue_millis" : 0,
"active_shards_percent_as_number" : 99.08536585365853
}
curl -s -XGET localhost:9200/_cat/shards?h=index,shard,prirep,state,unassigned.reason | grep UNASSIGNED
org.web.metrics-2019.12.18 1 r UNASSIGNED INDEX_CREATED
org.web.metrics-2019.12.18 3 r UNASSIGNED INDEX_CREATED
org.web.metrics-2019.12.18 4 r UNASSIGNED INDEX_CREATED
org.web.metrics-2019.12.18 2 r UNASSIGNED INDEX_CREATED
org.web.metrics-2019.12.18 0 r UNASSIGNED INDEX_CREATED
org.lb.performance-2019.12.18 1 r UNASSIGNED INDEX_CREATED
org.lb.performance-2019.12.18 3 r UNASSIGNED INDEX_CREATED
org.lb.performance-2019.12.18 4 r UNASSIGNED INDEX_CREATED
org.lb.performance-2019.12.18 2 r UNASSIGNED INDEX_CREATED
org.lb.performance-2019.12.18 0 r UNASSIGNED INDEX_CREATED
org.web.errors-2019.12.18 1 r UNASSIGNED INDEX_CREATED
org.web.errors-2019.12.18 3 r UNASSIGNED INDEX_CREATED
org.web.errors-2019.12.18 4 r UNASSIGNED INDEX_CREATED
org.web.errors-2019.12.18 2 r UNASSIGNED INDEX_CREATED
org.web.errors-2019.12.18 0 r UNASSIGNED INDEX_CREATED
org.security.login-2019.12.18 1 r UNASSIGNED INDEX_CREATED
org.security.login-2019.12.18 3 r UNASSIGNED INDEX_CREATED
org.security.login-2019.12.18 4 r UNASSIGNED INDEX_CREATED
org.security.login-2019.12.18 2 r UNASSIGNED INDEX_CREATED
org.security.login-2019.12.18 0 r UNASSIGNED INDEX_CREATED
org.lb.syslog-2019.12.18 1 r UNASSIGNED INDEX_CREATED
org.lb.syslog-2019.12.18 3 r UNASSIGNED INDEX_CREATED
org.lb.syslog-2019.12.18 4 r UNASSIGNED INDEX_CREATED
org.lb.syslog-2019.12.18 2 r UNASSIGNED INDEX_CREATED
org.lb.syslog-2019.12.18 0 r UNASSIGNED INDEX_CREATED
org.servers.vpn-2019.12.18 1 r UNASSIGNED INDEX_CREATED
org.servers.vpn-2019.12.18 3 r UNASSIGNED INDEX_CREATED
org.servers.vpn-2019.12.18 4 r UNASSIGNED INDEX_CREATED
org.servers.vpn-2019.12.18 2 r UNASSIGNED INDEX_CREATED
org.servers.vpn-2019.12.18 0 r UNASSIGNED INDEX_CREATED
org.web.iis-2019.12.18 1 r UNASSIGNED INDEX_CREATED
org.web.iis-2019.12.18 3 r UNASSIGNED INDEX_CREATED
org.web.iis-2019.12.18 4 r UNASSIGNED INDEX_CREATED
org.web.iis-2019.12.18 2 r UNASSIGNED INDEX_CREATED
org.web.iis-2019.12.18 0 r UNASSIGNED INDEX_CREATED
org.obp.iis-2019.12.18 1 r UNASSIGNED INDEX_CREATED
org.obp.iis-2019.12.18 3 r UNASSIGNED INDEX_CREATED
org.obp.iis-2019.12.18 4 r UNASSIGNED INDEX_CREATED
org.obp.iis-2019.12.18 2 r UNASSIGNED INDEX_CREATED
org.obp.iis-2019.12.18 0 r UNASSIGNED INDEX_CREATED
org.servers.systeminfo-2019.12.18 1 r UNASSIGNED INDEX_CREATED
org.servers.systeminfo-2019.12.18 3 r UNASSIGNED INDEX_CREATED
org.servers.systeminfo-2019.12.18 4 r UNASSIGNED INDEX_CREATED
org.servers.systeminfo-2019.12.18 2 r UNASSIGNED INDEX_CREATED
org.servers.systeminfo-2019.12.18 0 r UNASSIGNED INDEX_CREATED
curl -s -XGET 'localhost:9200/_cat/recovery?v&pretty' | wc -l
4876
curl -s -XGET localhost:9200/_cluster/allocation/explain?pretty
{
"index" : "org.web.metrics-2019.12.18",
"shard" : 1,
"primary" : false,
"current_state" : "unassigned",
"unassigned_info" : {
"reason" : "INDEX_CREATED",
"at" : "2019-12-18T03:00:03.196Z",
"last_allocation_status" : "no_attempt"
},
"can_allocate" : "no",
"allocate_explanation" : "cannot allocate because allocation is not permitted to any of the nodes",
"node_allocation_decisions" : [
{
"node_id" : "nvesxBuQT3OpFi7s4UA7sg",
"node_name" : "seswelog01",
"transport_address" : "172.16.33.195:9300",
"node_attributes" : {
"ml.machine_memory" : "135109414912",
"xpack.installed" : "true",
"ml.max_open_jobs" : "20",
"ml.enabled" : "true"
},
"node_decision" : "no",
"weight_ranking" : 1,
"deciders" : [
{
"decider" : "same_shard",
"decision" : "NO",
"explanation" : "the shard cannot be allocated to the same node on which a copy of the shard already exists [[org.web.metrics-2019.12.18][1], node[nvesxBuQT3OpFi7s4UA7sg], [P], s[STARTED], a[id=Wj9Zm00tTHujWUGqmfwyLg]]"
}
]
}
]
}
curl -s -XGET localhost:9200/_cluster/settings?pretty
{
"persistent" : {
"cluster" : {
"routing" : {
"allocation" : {
"node_concurrent_recoveries" : "5",
"enable" : "all",
"node_initial_primaries_recoveries" : "8"
}
}
},
"indices" : {
"recovery" : {
"max_bytes_per_sec" : "200mb"
}
},
"xpack" : {
"monitoring" : {
"collection" : {
"enabled" : "true"
}
}
}
},
"transient" : { }
}