Flush.total numbers are big different in cluster

jasony · January 4, 2019, 2:30am

Hello,
My cluster consists of 9nodes and uses hot-warm architecture.

01: coordinating
02,03: master eligible nodes
04,05,06: hot
07,08,09: warm

I have questions about flush.total results from _cat api's.

As we see from below results, flush.total numbers were totally different from each nodes(warm). Is it possible to make them equal (or close to)? so each node work equally.

I also checked with hot_threads api - 08 node is doing nothing and only 07,09 nodes are busy for "[flush]". last batch (via curator:allocation) was run 22hours ago.

Please advice if there are any setting to make them equally receiving traffics and executing flush.

Thanks!

GET _cluster/health

{
"cluster_name": "gm",
"status": "green",
"timed_out": false,
"number_of_nodes": 9,
"number_of_data_nodes": 6,
"active_primary_shards": 1215,
"active_shards": 2430,
"relocating_shards": 0,
"initializing_shards": 0,
"unassigned_shards": 0,
"delayed_unassigned_shards": 0,
"number_of_pending_tasks": 0,
"number_of_in_flight_fetch": 0,
"task_max_waiting_in_queue_millis": 0,
"active_shards_percent_as_number": 100
}

GET _cat/nodes?v&s=id&s=name&h=name,flush.total,flush.total_time
name flush.total flush.total_time
suy-prd-opr-els-01 0 0s
suy-prd-opr-els-02 0 0s
suy-prd-opr-els-03 0 0s
suy-prd-opr-els-04 353452399 1.9h
suy-prd-opr-els-05 1217076786 1.9h
suy-prd-opr-els-06 8410385606 3h
suy-prd-opr-els-07 14970158534 2.7h
suy-prd-opr-els-08 346690902 3.7m
suy-prd-opr-els-09 66519382342 9.8h

GET _cat/nodes?v&s=name&h=name,disk.total,disk.used,disk.used_percent
name disk.total disk.used disk.used_percent
suy-prd-opr-els-01 47.6gb 5.7gb 11.97
suy-prd-opr-els-02 47.6gb 5.3gb 11.17
suy-prd-opr-els-03 47.6gb 5.3gb 11.14
suy-prd-opr-els-04 499.7gb 169.7gb 33.97
suy-prd-opr-els-05 499.7gb 178.8gb 35.79
suy-prd-opr-els-06 499.7gb 173.4gb 34.70
suy-prd-opr-els-07 3.9tb 2.8tb 74.12
suy-prd-opr-els-08 3.9tb 2.6tb 68.29
suy-prd-opr-els-09 3.9tb 2.6tb 68.92

bloke · January 5, 2019, 1:32am

Can you please advise if you are using routing or shard allocation as described here
https://www.elastic.co/guide/en/elasticsearch/reference/current/shard-allocation-filtering.html
This type of configuration can assist in loading and balancing

jasony · January 5, 2019, 2:28am

i don't use and leave them as default, but my nodes are not in same zone nor rack.

bloke · January 5, 2019, 3:07am

Perhaps that is a reason why you should consider using it
If you have a smaller development cluster you could try the results of these settings

I use this for example.

https://www.elastic.co/guide/en/elasticsearch/reference/current/allocation-awareness.html

in this type of capacity

node.name: node-1
# Add custom attributes to the node:
#
node.attr.dc_id: THIS_DC
cluster.routing.allocation.awareness.attributes: dc_id

Another node

node.name: node-2
#
# Add custom attributes to the node:
#
node.attr.dc_id: THAT_DC
cluster.routing.allocation.awareness.attributes: dc_id

I have more nodes in the cluster than just 2 but you get the understanding i hope
you may want to try these settings and experiment on something smaller

I use it because i can spread the load and allocations for THIS_DC or THAT_DC as i like, you can customise you attributes to suite...

I hope this helps you in some way

jasony · January 5, 2019, 3:41am

i will take a look. thanks!

bloke · January 5, 2019, 4:02am

What type of disk are you writing to? How many replicas are you dealing with in your indexes?

I have the following stats - but i have flash array and not tiered like you

name             flush.total flush.total_time
lb-kibana           0               0s
node-1              4581            11m
node-2              3357            1.2m
node-3              4506            36.2m
node-4              4470            33.7m
node-5              2922             2.8m

And

name             disk.total disk.used disk.used_percent
lb-kibana       3.5gb     1.9gb             56.53
node-1          2.9tb     1.1tb             39.42
node-2          2.8tb   505.8gb             17.13
node-3          2.8tb     1.3tb             45.13
node-4          2.8tb     1.3tb             45.68
node-5          2.8tb   604.3gb             20.47

nodes 1 3 and 4 are in a group and nodes 2 and 5 are in another the lb-kibana node is a load balancer for kibana

Your flush times are higher due to something like shard allocation, replicas, infrastructure etc..

Christian_Dahlqvist · January 5, 2019, 5:39am

The stats are counters initiated at startup as far as I recall and the node with lower value seems to have a much shorter uptime than the others which could explain the difference?

jasony · January 7, 2019, 1:46am

I queried again for each node's uptime info, but they were up at the same time.

GET _cat/nodes?v&s=id&s=name&h=name,flush.total,flush.total_time,uptime

name flush.total flush.total_time uptime
suy-prd-opr-els-01 0 0s 64.8d
suy-prd-opr-els-02 0 0s 64.7d
suy-prd-opr-els-03 0 0s 64.7d
suy-prd-opr-els-04 353463374 2h 71.6d
suy-prd-opr-els-05 1217078363 1.9h 71.6d
suy-prd-opr-els-06 8410387267 3.1h 71.6d
suy-prd-opr-els-07 236892469127 1.1d 4.8d
suy-prd-opr-els-08 78613201854 11.2h 4.8d
suy-prd-opr-els-09 304126539811 1.7d 4.8d

Christian_Dahlqvist · January 7, 2019, 6:13am

It look like I did indeed misread the data. Then I am not sure what is going on.

system · February 4, 2019, 6:13am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
High CPU Load on only some of the machines in a cluster Elasticsearch	14	2302	July 6, 2017
Elasticsearch - shards not splitted equally Elasticsearch	10	4760	July 5, 2019
Elasticsearch shard allocation, uneven distribution of shards among nodes Elasticsearch	3	966	January 18, 2023
GET _node/hot_threads result shown warm nodes are doing flush forever Elasticsearch	1	392	February 5, 2019
Heavy load on one node (1 index) Elasticsearch	12	2265	July 6, 2017

Flush.total numbers are big different in cluster

Related topics