Would like to know how logstash "input" reads files from a directory.
I have files which are created based on size limit and each new log creation is appended with the creation date in it's name.
Ex:
file-01012021-000000
file-01012021-100000 <----- this is created when above file reaches 10Mb.
file-01012021-150515
And so on.....
Is there a way I can make logstash read the files in order? I am using --pipeline.workers 1 to grab ordered data and i am not sure if the files will be read in order if I point the input file path to "/home/logs/file*"
Sounds right. The default behaviour is to read up to 4,611,686,018,427,387,903 chunks of 32 KB from a file before it moves on to the next one. so it does indeed read each file completely.
First : i need to index all the logs ( as mentioned above in order of file names ) with start_position to beginning
Second : Once the indexing is completed, i will change the start_position to end and continue from where i left ( as sincedb is already tracking the position of it ).
As new logs are created, i should be able to continue indexing new log file till it ends and the cycle will continue.
The source option of a fingerprint filter takes an array of field names, so you can do something like
source => [ "[host][name]", "message" ]
You will want to use the concatenate_sources option if you are fingerprinting multiple fields, otherwise you just get a fingerprint of the last source.
If you want the fingerprint to be the digest of dev1 or dev2 then yes, it must be after the conditionals. All the events probably have an id based on the digest of the name of the host that logstash is running on.
BTW, key is not required. It was at one time, and the filter was creating digests. Now, if you omit the key option it will just create a hash of the source field.
Also I would recommend using SHA256. SHA1 is no good for anything these days.
And about the host, i have logs distributed into different folders named as dev1....dev24. Based on the path,
ex: /path/to/dev1/logs*
/path/to/dev2/logs*
/path/to/dev3/logs*
/path/to/dev4/logs*
and so on...
if path has "dev1" in it, the host value will be changed to dev1 and later, this will be used in adding fields like add_field => { "%{host}_temp" => "%{tempvalue}" }
based on your earlier solution for getting time durations of each value, can we add another parameter to also check the change of path or filename too?
Like, this code of yours is checking of change of value and subtracting timestamps to get durations. If this code can also check change of path, and parse new file as separate file.
Apologies, i was jumping from that post to this for combining the solutions!
i tried to use exclude => "temp.log.10" in the file input and seems to work properly, guess i have to make a way around to write another input in same config and parse that file.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.