We are planning to use Wazuh for some of our monitoring needs. Wazuh does not support ECS. Now, I plan to make the Wazuh data ECS compliant for better monitoring!
My question is, which of the following would be a better option and why? What impact would they have from a performance perspective?
Create an Alias for all Wazuh fields
Copy the Wazuh fields into ECS compliant field names
A thing to consider is that Wazuh logs are quite varied and will probably need to create a table for well over 20-30 fields! Each log line will have a minimum of 10-15 fields. So adding a new field would probably double the storage needs, but would aliases lead to a performace hit when you have like Billions of messages?
I would absolutely recommend copying the fields rather than creating field aliases.
Field aliases are not a perfect replacement to having the data in the expected ECS fields. Read the sections "Resolve schema differences and conflicts" and the section "Aliases" in this blog post, where I go in depth about this:
Don't worry about having 20-30 more fields, Elasticsearch can handle a lot more than that
I don't think it's as linear or straightforward, since the data is sliced and diced in so many ways by Elasticsearch.
One of the things you can experiment with is your mapping / index template's codec parameter. Read up on it here
You can also read up on multiple other strategies to tune disk usage here.
But I'd suggest you test the two approaches (alias only vs copy) with a few thousand documents in two different indices, and compare their size, before going into too much premature tuning
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.