I am going to escalate your recommendations and we will try to implement them as soon as possible, I just wanted to clarify a few things in case this implies changing any parameter of what was discussed.
The 4 TB of data is for each HOT node of the 6 that you have, that is, 24TB in the HOT part between the information + the replica.
I will tell you more details and updated information as soon as I have it, thank you for your help!
Sorry to emphasize this again, but doesn't separating the 700TB of total data into more than one indexset have a significant impact on performance? At the end of the day it is like having a SQL with only one monstrous table where all the data is entered in a nutshell.
I know it is not comparable, but I am not sure if it is the most optimal to put everything in the same indexset, because without going any further, sometimes we get warnings about reaching the limit of indexed fields of 1000, therefore if it is divided The information by affinity in different indexsets should a priori be more optimal, right?
Regarding the change in the size of the shards, in principle we should have done it but it has become complicated and we have not yet been able to carry it out, I will keep you informed.
Best practice in general is to group data with similar mappings into index sets, and part of this is to avoid mapping explosion with respect to the number of fields. You do not want to go too granular though as you will end up with lots of very small indices and shards, which is very inefficient and hurts performance.
It seems like Graylog only creates a single index set, which may indeed not be optimal. That is however something you would need to address with them.
Then I would recommend creating multiple index sets based on how similar the mappings for different types of data are. As retention management is done at the index level, another aspect to consider is to also group data that have the same retention period together.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.