ElasticSearch Index Storage Optimization - Firewall Logs

viera120 · July 14, 2023, 12:27pm

We are running a 3 node cluster to index logs from a firewall.

The nodes are physical machines (20 Core CPUs, 16GB RAM, SSD Storage).

Each days logs are stored in an individual index. The storage utilized per index works out to be approximately 800 to 1000 MB per million logs lines.

Index mapping consists of some 170 fields which have been mapped dynamically. Two fields are numbers, this has been done using the mutate filter in Logstash:

    convert => { "receivedbyte" => "integer" }
    convert => { "sentbyte" => "integer" }

The remaining fields are Text.

Is there any way to optimize storage so that the present size of 800-1000MB/Million log lines can be reduced without reducing the read/write performance of the cluster?

Christian_Dahlqvist · July 14, 2023, 12:32pm

If text fields are dynamically mapped thay are generally mapped as both text and keyword, which takes up more space. For many types of data both are not required so creating an index template to change this can save a lot of space. This, together with other tips, are described in the docs.

viera120 · July 14, 2023, 12:58pm

Thanks you for the response.

In case of using explicit mapping with an Index Template, would the logstash mutate filter still be required for converting fields to numbers?

Christian_Dahlqvist · July 14, 2023, 1:07pm

If you want numeric fields to be dynamically mapped as numbers you still need it.

viera120 · July 14, 2023, 1:28pm

Would new fields which are not explicitly mapped via Index Template continue to be mapped dynamically?

Christian_Dahlqvist · July 14, 2023, 1:35pm

Yes.

viera120 · July 14, 2023, 1:58pm

So we've modified the Index Template to mostly use Keywords. Numeric fields have been set to use numeric types.

Is unsigned_long a valid type for numeric fields? While it is mentioned as a valid type in the docs, it doesn't show up in the Field Type > Numeric Type menu.

Thanks a ton! Will post back about the space savings once there's enough data for a comparison

DavidTurner · July 14, 2023, 2:30pm

You might also be interested in the analyze disk usage API which can break down your disk usage by field, and also the field usage stats API which can tell you which fields you're actually using in your searches.

DavidTurner · July 14, 2023, 2:32pm

I would normally expect that removing unused fields would improve your write performance, sometimes substantially, and avoiding dynamic mappings is also a good move for performance.

viera120 · July 20, 2023, 11:51am

The size per event is between 850 - 900 bytes on average now, ~800MB/Million log lines.

We converted all dynamically mapped fields containing strings to Keyword, fields with numeric values are byte/short/integer/long. The dashboards will need quite some work to be start working again, given how IP addresses are not Text types with the IP.Keyword field any more

Is unsigned integer type not available as a data type? There is a mention of unsigned_long in the documentation but it does not appear in Kibana as an available data type.

As for IP addresses, we've used the fields parameter to have them mapped as ip addresses and also as Keywords so that the Visualization feature works. No pretty graphics for non-String types? It is also not possible to filter out non-string types from the GUI using the '-'/Filter out option.

We did try gathering data from the APIs, turns out the _source field take up about two thirds of the index size. The data types that are presently in use are compatible with the use of Synthetic _source. How much can we expect the disk usage to go down by in case Synthetic _source is used? Is there a read or write performance trade off involved?

system · August 17, 2023, 11:52am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Avoid too many useless field on elastic search - kibana Logstash docker	10	1692	July 9, 2021
How to reduce indices space? Elasticsearch	2	447	May 7, 2018
ELK cluster disk space usage optimization Elasticsearch	9	2524	July 5, 2017
Reducing Index Footprint Elasticsearch	7	867	October 27, 2020
Index Space Utilization On Elastic Nodes Elasticsearch	14	220	April 19, 2024

ElasticSearch Index Storage Optimization - Firewall Logs

Related topics