Hi there!
Occasionally I get a bunch of logfiles from several machines, dating back a couple of weeks.
These are similar machines in terms of hardware and software,
and have a few different config items like hostname /uuid and such.
I tend to delete these indices after my analysis, never "stream" them into the index in the first place, but plain load with a prospector pointing to a path like C:\Temp\Loganalysis\foobar*\app-201*.log
I am sucessfully importing these with filebeat through logstash, and even manage to calculate elapsed time with-w 1
and "elapsed" -- and visualize them in Kibana. All very nice!
Now I want to go the next step and automate my "loading" even further.
My most important config items, hostname and uuid, are never in the first line(s) of any logfile, but part of the zip-Archive that I get delivered, like
`archive.YYYY-MM-DD.zip
- db
- db/export.csv
- log
- log/app.YYYY-MM-DD.log
- uuid`
I typically get 50-150 machine with 30-60 days of archives.
I do extract the logfiles from zip files with a bash script in cygwin, and also extract the uuid from an accompagnying file -- a huge for loop and I write everything into an intermediate folder structure. I love that part because seperating E-T-L feels liek the right thing to do.
Now what should I do with ${UUID} and C:\Temp\Loganalysis\foobar\${UUID}\log-201*.log
containing abount 5% of interesting include_lines ?
Ideally I would
- not just be able to enrich each "machine import" with its uuid,
- but also be able to enrich each single file
during "prospecting", because I try to save the md5sum of each file with each line (because I am paranoid to import a file twice, stupid rotation-by-filesize stupid! (I also have server.log and server.log.1 in log4j-fashion that I plan to import in the future once this app.YY-MM-DD.log is working.)
Should I write a filebeat.yml for each import and start (sequentially) a filebeat.exe for each machine with its uuid?
The logfile format itself does not lend itself to appending to the ean od each line, as there is no separator other than whitespaces. But if you experienced guys tell me to enrich each logline first then I'd also script that -- and give the E-T-L its own phase.
I also can do filebeat.exe -e -E fields.uuid='${UUID}' -c filebeat-applog.yml
but that needs one filebeat*.yml per machine, because there is no command line option for the YAML-prospector-config with the - paths: - C:\Temp\Loganalysis\foobar\*\app-201*.log
structure.
And it does not work with a field "md5sum" per file.
I know that backfilling is kind of against the grain, but is actually common outside the cosy world of datacenters! https://medium.com/@cprior/a-blind-spot-in-textbook-service-management-1b464dc0aec9
So please don't scold me for such a usecase.
I would be very happy if some of you shared your "enriching procedures" with me.
If there was a way to "T-L" in one step I'd gladly take this shortcut.
BR,
Chris