I'm currently using filebeat -> logstash -> elastic to back fill elastic search with exit codes from several thousand text output files each 10's MB in size and >10k lines. ~200GB in total. The server can push 5GB/s read/write so that's not a bottleneck.
I'm configuring filebeat to search for specific keyworks which I am specifying in the "include_lines: ['keyword1', .......,'keywordN'], where the number of keywords could be as high as 20 but at present my problems (performance) are showing with a single keyword. I also wish to use exclude_lines at some point.
Performance is incredibly slow, any recommendations for improving the performance?
I suspect parsing the files is one aspect, how exacly does the include/exclude_lines work?
But also the number of harvesters which is started in parallel possibly? - I have tried limiting the number of harvesters by setting the harvester_limit
to the number of cores.
filebeat.inputs:
- type: log
enabled: true
paths:- /pathto/symlinksdir/*
symlinks: true
tags: ["some_value"]
fields: {log_type: "some_value2"}
include_lines: ['keyword']
- /pathto/symlinksdir/*
Thanks