Performance Problems

Good afternoon @Christian_Dahlqvist

I am going to escalate your recommendations and we will try to implement them as soon as possible, I just wanted to clarify a few things in case this implies changing any parameter of what was discussed.

The 4 TB of data is for each HOT node of the 6 that you have, that is, 24TB in the HOT part between the information + the replica.

I will tell you more details and updated information as soon as I have it, thank you for your help!

Sorry to emphasize this again, but doesn't separating the 700TB of total data into more than one indexset have a significant impact on performance? At the end of the day it is like having a SQL with only one monstrous table where all the data is entered in a nutshell.

thanks greetings!

Good afternoon,

We have planned to change the size of the shards for the week of January 2nd, I will tell you the news.

Thank you and merry Christmas!

Good morning,

We will finally implement the changes next week, keeping you informed.

Thanks greetings!

I do not understand this comment/question. Elasticsearch works very differently compared to a SQL database.

Good afternoon,

I know it is not comparable, but I am not sure if it is the most optimal to put everything in the same indexset, because without going any further, sometimes we get warnings about reaching the limit of indexed fields of 1000, therefore if it is divided The information by affinity in different indexsets should a priori be more optimal, right?

Regarding the change in the size of the shards, in principle we should have done it but it has become complicated and we have not yet been able to carry it out, I will keep you informed.

Thanks greetings!

Best practice in general is to group data with similar mappings into index sets, and part of this is to avoid mapping explosion with respect to the number of fields. You do not want to go too granular though as you will end up with lots of very small indices and shards, which is very inefficient and hurts performance.

It seems like Graylog only creates a single index set, which may indeed not be optimal. That is however something you would need to address with them.

No, we can really have as many indexsets as we want, Graylog is not a problem for this.

Then I would recommend creating multiple index sets based on how similar the mappings for different types of data are. As retention management is done at the index level, another aspect to consider is to also group data that have the same retention period together.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.