I'm having some problems with ingesting log files I have on disk to Elastic. For testing purposes I have 1 plain text file that I want to push to Elastic however nothing is happening. I don't see anything in the logs either.
I'm not sure if the kv filter is going to get me the results I want, logs are per line so it might work but anyway this is just for testing to see how it looks.
The log name has a timestamp but the log content has no timestamps. It is about 5000 lines with about 50 different status codes.
Thanks but could you give me some more info? I've checked the doc page but I'm not sure how this could cause logstash/elastic not to read the single file I have. Its a static file that is not changing.
Basically, if you have already processed the file before then Logstash tracks where it read to. So if this is a file that never changes, the sincedb tells Logstash to skip everything so it appears to not process anything.
No luck. Doesn't appear that file is being written to either.
Is there any way to check whether logstash is looking at the file? In the plain.log I can see its opening ports for other configs etc. but I don't see anything related to reading the file off disk.
I suppose I'll have to write a custom grok filter for each field and value I want to send to Elastic?
Is it possible to write grok filters that ignore things such as *** so that an error is written to the error field regardless of the original data showing >ERROR or +***ERROR?
How can I best handle the timestamp? Logs are created hourly and named something like 08311700, 08311800 etc. There are no timestamps inside the log and the time at which the log gets ingested by logstash and sent to elastic might be totally different from the filename timestamp.
Can logstash handle separate files and how would logstand and elastic handle the timestamp? In same cases older logfiles might get ingested after newer logfiles as well but that needs to be reflected in the data so logs don't get mixed up.
I'm sorry for all the questions, somebody smarter than me should be on this but alas there isn't.
I tried that but it didn’t work. Probably because I’m doing something else wrong.
start_position only matters for new files. If Logstash has seen the file before (according to the sincedb file) it'll start from the old position (probably the end of the file).
I suppose I’ll have to write a custom grok filter for each field and value I want to send to Elastic?
Yes.
Is it possible to write grok filters that ignore things such as *** so that an error is written to the error field regardless of the original data showing >ERROR or +***ERROR?
Of course.
How can I best handle the timestamp? Logs are created hourly and named something like 08311700, 08311800 etc. There are no timestamps inside the log and the time at which the log gets ingested by logstash and sent to elastic might be totally different from the filename timestamp.
You'll find the filename in one of the fields (path, I believe) so you can use grok to extract the timestamp from there.
Can logstash handle separate files and how would logstand and elastic handle the timestamp? In same cases older logfiles might get ingested after newer logfiles as well but that needs to be reflected in the data so logs don’t get mixed up.
If you set the timestamp correctly you'll be fine.
Thanks magnus, its always very reassuring to hear from you
I'll see what I can make of it. Tried some .csv logs yesterday and those are very easy to process, only took me 6 hours to get it working almost perfectly.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.