Utilization all path.data in elasticsearch


(Андрей D) #1

Is it impossible?
We have 3 elasticsearch servers with similar config:

path:
  data:
    - /data/1/elasticsearch
    - /data/2/elasticsearch
    - /data/3/elasticsearch
    - /data/4/elasticsearch

df -h:

/dev/sda4       5,2T  239G  4,7T   5% /data/1
/dev/sdb4       5,2T  304G  4,7T   7% /data/2
/dev/sdc4       5,2T  238G  4,7T   5% /data/3
/dev/sdd4       5,2T  283G  4,7T   6% /data/4

from filebeats -> logstashes -> elastic we receive many logs (around 30.000/sec). I disable replication (we don't afraid to lose logs).
On day we have two index - filebeat-7d, filebeat2-7d, delete index older than 7 days.
Shards per index I didn't change.
In grafana I view that not all disks are using
node1 node2

It will be cool, if I may be to force elasticsearch to use all path.data. it seems to me, that I must change default shards per index - increase or decrease?
In google, I search and find, that for high perfomance I need decrease shards per index.
But, logically:
3 servers * 4 path.data = 12
6 shards per index?
Could any one explain how right choose shards or another option for utilization all disks?


(David Turner) #2

Elasticsearch balances shards across nodes, but not across data paths within each node. If you want this kind of balancing, I suggest you try running 12 nodes, each on a single data path.


(Андрей D) #3

thank you very much