GET / output:
{
"name": "es-ingest-3",
"cluster_name": "ct",
"cluster_uuid": "IED4OO5CR6ur7ZCRNmfWEg",
"version": {
"number": "6.1.2",
"build_hash": "5b1fea5",
"build_date": "2018-01-10T02:35:59.208Z",
"build_snapshot": false,
"lucene_version": "7.1.0",
"minimum_wire_compatibility_version": "5.6.0",
"minimum_index_compatibility_version": "5.0.0"
},
"tagline": "You Know, for Search"
}
GET _cat/nodes?v output:
ip heap.percent ram.percent cpu load_1m load_5m load_15m node.role master name
10.4.128.2 9 97 5 0.52 0.25 0.13 m - es-master-3
10.4.0.3 11 92 17 0.13 0.21 0.29 m - es-master-1
10.4.0.7 38 86 93 11.51 6.04 3.68 i - es-ingest-1
10.5.0.2 39 69 6 0.60 0.43 0.22 d - es-data-3
10.6.0.4 61 67 38 1.74 1.30 0.96 i - es-ingest-2
10.7.0.2 12 99 11 0.61 0.57 0.55 m - es-master-2
10.1.0.2 19 88 22 1.37 0.95 0.73 d - es-data-4
10.9.128.2 24 90 19 1.50 0.94 0.78 d - es-data-7
10.1.192.2 12 97 24 1.15 1.39 0.79 d - es-data-0
10.3.0.9 48 99 64 4.75 3.65 3.33 d - es-data-1
10.2.0.2 71 96 30 1.21 0.97 0.93 d - es-data-6
10.7.128.6 61 69 44 0.86 0.76 0.70 i - es-ingest-0
10.6.8.4 15 97 3 0.07 0.09 0.09 m - es-master-4
10.9.0.3 25 80 5 0.18 0.30 0.43 m * es-master-0
10.6.128.2 59 90 49 4.31 3.92 4.04 d - es-data-9
10.2.128.2 21 68 6 0.26 0.16 0.10 d - es-data-5
10.8.0.2 68 78 34 1.33 1.08 1.08 d - es-data-8
10.7.128.5 54 69 45 0.86 0.76 0.70 i - es-ingest-3
10.3.128.2 74 91 22 1.27 0.77 0.68 d - es-data-2
Average indexing rate : [6k to 10k per sec]
When I perform the following steps the indexing rate increased to 25k to 30k per sec :
PUT /logstash-*/_settings
{
"index" : {
"refresh_interval" : "-1"
}
}
PUT /logstash-*/_settings
{
"index" : {
"number_of_replicas" : "0"
}
}
By performing the above steps it improved the rate and ran very well for about 10days indexing nearly 17Billion documents. But it failed when there was a spike of 1.3 Billion documents per day. Since then, this has not been stable, every time I perform the above steps, the cluster indexing rate increases for a few hours and then crashes again.
Then I have been performing the following steps in the same order :
PUT /logstash-*/_settings
{
"index" : {
"number_of_replicas" : "1"
}
}
PUT /_cluster/settings
{
"transient" : {
"cluster.routing.allocation.enable" : "none"
}
}
Waited for all the shards to be reallocated and then did the following steps :
PUT _cluster/settings
{
"persistent": {
"cluster.routing.allocation.enable": "primaries"
}
}
PUT /_cluster/settings
{
"transient" : {
"cluster.routing.allocation.enable" : "all"
}
}
PUT /logstash-*/_settings
{
"index" : {
"refresh_interval" : "-1"
}
}
PUT /logstash-*/_settings
{
"index" : {
"number_of_replicas" : "0"
}
}
We get an average 800 Million documents everyday, sometimes we get spikes of around 1.3 Billion per day. Please suggest the best way to address this. We currently have a 7 days backlog which needs to be cleared.
I am new to managing ES, please guide me.