LogsDB optimized routing

Hi Everyone,

I have a couple of questions regarding LogsDB, and more specifically optimized routing.
It is generally recommended to use low cardinality fields for LogsDB index sorting, but there are no pointers on what cardinality is best-practice.

Since the default is host.name, one can assume that the best cardinality would be the data source for the data stream integrations. For example, on system.syslog it would be host.name, but for network related integrations, like fortigate it would be observer.name.

However, for data sets with less sources, like fortigate, the storage savings are between 40 and 50 percent for our clusters. Compared to datasets with more sources, like system which is installed on a lot of hosts/agents, the savings are about 20%.

Is it then recommended to change those defaults for the system integration as well, and if so, what would be an example field used for sorting.

This is especially important if we want to implement optimized routing which requires a combination of two low cardinality fields, at the very least.

Thanks in advance for any pointers!

1 Like