Hi All,
I have 3 nodes elastic cluster each node assigned with 1Tb hard disk, 15GB Ram and 4 CPU
Elastic version using is 6.3.0.
At the moment I have 3,304,624,313 documents using 2.1 TB disk space
This data is collected within a month.
Problems is
Doing a search on cluster takes over 5 minutes.
In order to optimize search performance, what can I do?
what is the maximum data size 3 node cluster can handle?
Is it ok to split the indices vertically so small fields are grouped in one index? will it help improving performance
Regarding my 3 rd questiom, think my document has field1 and field2, where field2 is a long text. So is it ok to split the index into two indices index1 will contain filed1 only and index2 will contain field2 only. will it help improving my search performance as i will do the search on less number of fields?
Elasticsearch is generally very I/O intensive, so having fast storage is very important. Run iostat -x to see how the storage is performing. I would not be surprised to see a lot of iowait indicating that this is the bottleneck. If that is confirmed I would recommend to upgrading ton more performant storage.
Finally, I am planning have 3 times more day in the future as 3 months retention is required (the day we are looking at is one month)
if i am to stick to the same hardware spec will the following make any performance improvement?
splitting the index and putting filed1 2 in one index and filed 3 and 4 for i another index. out search queries are mostly based on a single filed which has a json payoad.
increasing the shard size to larger value and reducing the number of shards handling as for the moment i got 888 shards
In order to optimize search performance, what can I do?
You've got a ~2T dataset and ~50G RAM. This means: lots of I/O (dataset does not fit in RAM). Two options for increasing performance without other changes:
More RAM (= more data in memory and / or file system caches). RAM is way way faster than anything else. So everything coming from RAM is a big plus.
Faster disks (e.g. SSD, SSD in RAID). If the total amount of RAM is < 2 TB, significant disk I/O is needed. Spinning disks = ~125MB/s, single SATA SSD = ~500MB/s, SSD RAID sets of PCIe SSD = way way faster. This way everything NOT coming from RAM can still load sort of fast.
Mapping (change requires re-indexing):
According to other posts (I do not know the reason): use a max shard size of ~50GB.
With rule above in mind: keep as close to 1 shard per CPU core as you can (1 shard = 1 process).
Rough estimate in your case: ~2T / 0.05 = optimal is ~40 shards (if the dataset will not grow).
Since you've got CPU 12 cores this is not the most efficient setup. So more cores will help as well.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.