Need help reducing the volume of data

Cruz · October 9, 2024, 5:17am

Hi there!

I am trying to reduce the volume of my data in metrics monitoring. Currently, I filter out fields using Logstash and retain only the fields that I need for my dashboards.

I have 7 servers, and each server generates 204 MB of data per day. So, for 7 servers, that totals 1.4 GB per day.

Is there a way to further reduce the data? See the picture below; these are the fields I have retained for the dashboards.

grumo35 · October 9, 2024, 3:52pm

Hi,

This is a very nice question.

From my understanding you already are using default tools and tried to delete as much fields as you could, now will come your indexing strategy and how much nodes do you have ?

Can you increase the metric frequency, let's say 1 log per minute instead of 30 ?

How much precision do you need to have ?

How much retention do you need to have ?

You can also trim much more metadata fields if you dont need them.

dadoonet · October 9, 2024, 4:04pm

Are you using Time series data stream (TSDS) | Elasticsearch Guide [8.15] | Elastic? and specifically the time_series index mode?

What is your version?

Cruz · October 10, 2024, 11:06am

Hi,

I just tried to increase the metric frequency. Previously, I had set it to 10 seconds. Now, I am monitoring it to see if it reduces the data.

I would like to retain the data for 365 days before deleting it.

However, I’m not sure what else I need to filter out because the remaining fields are necessary for the dashboard.

For my setup, I am using a single node for this small deployment.

this is my metrics configuration

- module: system
  period: 30s
  metricsets:
    - cpu
    - memory
    - network
  process.include_top_n:
    by_cpu: 5      # include top 5 processes by CPU
    by_memory: 5   # include top 5 processes by memory

- module: system
  period: 2m
  metricsets:
    - filesystem
    - fsstat
  processors:
  - drop_event.when.regexp:
      system.filesystem.mount_point: '^/(sys|cgroup|proc|dev|etc|host|lib|snap)($|/)'

- module: system
  period: 2m
  metricsets: ["diskio"]
  processors:
  - drop_event.when.regexp:
      system.diskio.name: '^sr0$'

- module: system
  period: 2m
  metricsets: ["process"]
  processors:
  - drop_event.when.regexp:
      system.process.name: '^.*(kworker|ksoftirqd|rcu|watchdog|migration|kthread|rcu_sched|systemd|agetty|auditd|sshd|bash|ksmd|lvmetad|scsi|khungtaskd|jbd2|kblockd|bioset|dbus|khelper|kmpath|kintegrityd|khugepaged|fsnotify|ata_sff|LCPDEV|perf|crond|irqbalance|kdevtmpfs|writeback|deferwq|kswapd|kthrotld|kpsmoused|tail|vballoon|polkitd|metricbeat|ttm_swap|syslog|udp_rcv).*'

Thank you for your response!

Cruz · October 10, 2024, 11:09am

Hi @dadoonet ,

I am using the regular data stream.

The version I am using is 8.9.0.

Thank you for your response.

Topic		Replies	Views
How can I compress or remove indirect data? Elasticsearch	2	410	July 6, 2017
Filtering Metricbeat data in LogStash Logstash	1	513	August 29, 2017
Remove event received from Metricbeat to Logstash Beats metricbeat	3	1081	June 8, 2017
Would someone please paste a verbose sample of filter metrics? Logstash	1	570	July 6, 2017
Set logstash to only collect data every X seconds Logstash	3	637	July 6, 2017

Need help reducing the volume of data

Related topics