One of clusters has the following specs
8 nodes (8 data and 5 master eligible) each with 15GB heap space running on servers with 32GB of RAM.
Elasticsearch version 7.6.2 on linux
each node has 8 TB of space allocated
around 30 billion documents are there
Each node has shards of 1443
total write i/o is around 350/sec
The application team is complaining the writes are slow, how can we improve writes to cluster?
That looks like a lot of shards per node. Significantly more than recommended. How many indices and shards are you actively indexing into? How are you indexing into Elasticsearch? What type of storage do you have?
storage is XIO Flash storage
ingestion happening from a python script, any given time there are 34 processes that are capable of writing.
each process will only write to a single index at a time, index refresh is disabled when they start on a particular work item
on completion of the work item they trigger a manual refresh
the volume of 2.2k items per day will be roughly ingested, which isn't 1:1 with indexes (As several of these items share an index)
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.