I need to answer some questions I have about the behavior of a couple things. Could you please help me?
For systems that don't allow communication with Kafka I have a Logstash that makes this communication, this works perfectly. Send at a good speed, and swallows the peaks with relative ease.
-How can I improve the efficiency of this output and of this Logstash in reality?
-What is the most efficient topic configuration? (We are currently considering the possibility of two topics, one group of producers and one group of consumers per customer).
The second Logstash is the one that would perform all the mapping and would have RAM and processor to spare, but it takes a lot to consume the events stored in Kafka, even without carrying out any mapping or filtering.
I have modified the input, choosing the pipe plugin with the Kafka consumer command, and uploading the autocommit to 1 minute to increase its performance, but even with resources to spare, it does not go to the rhythm that I would like ... And leaves too many events in Kafka.
-How could you maximize this Logstash to verify the speed at which it is able to assimilate logs (obviating its integration with Kafka perhaps)?
-How can I perfect those Kafka input plugins ?, because here is where I see most problems with SIEM...
The cluster with which I am practicing has three master nodes, three data nodes, and two client nodes (to redound Kibana). It works perfect everything, but I have several doubts about it:
- Regarding the number of indexes, types, shards and replicas.
-What would be the optimum configuration here to maximize profitability, both search time and storage? I have to find a good balance in both, especially for AllInOne environments. -Would you have a relationship "indexes-types-shards" -> "speed" -> "size-storage"? I'm pretty confused about it ...
- Regarding the number of data added.
I have read since joins can not be made in Elasticsearch, searches involving several documents, (users with the largest number of failed logins in our entire system) should be done in pairs, first looking for the user X in our indexes Users, and then the number of logins you have made. And that the only way to increase search efficiency would consist in adding fields to avoid performing this type of joins.
-Does this greatly increase the size of the index? -How much would we be talking about? Is it correct, positive (and, above all, acceptable) to increase the size of indexed documents by adding data from other documents to facilitate these multidocument searches?
- Regarding the compression of indexed data.
How much is estimated to weigh an index of about 100,000 million events (collected over two years), indexed by three or four fields, without data aggregations and compressed with "index.codec: best_compression".
This question, I know that it is difficult to answer with exact information, but is that right now I do not even give an estimated number.
- Can re-indexing and re-compression be done to lighten the index? (But you still have to be able to search for this data, even if it is more expensive).