Unable to send lof files on disk to Elastic


(Sjaak) #1

Hi,

I'm having some problems with ingesting log files I have on disk to Elastic. For testing purposes I have 1 plain text file that I want to push to Elastic however nothing is happening. I don't see anything in the logs either.

input {
  file {
    path => "/home/test/Downloads/K70001_201708302100.txt"
    type => "EQUIPMENT"
  }
}

filter {
  if [type] == "EQUIPMENT" {
    kv {
    }
  }
}

output {
  if [type] == "EQUIPMENT" {
    elasticsearch {
    hosts => localhost
    index => "EQUIPMENT-%{+YYYY.MM.dd}"
    }
  }
}

I'm not sure if the kv filter is going to get me the results I want, logs are per line so it might work but anyway this is just for testing to see how it looks.

The log name has a timestamp but the log content has no timestamps. It is about 5000 lines with about 50 different status codes.

What am I doing wrong?


(Mark Walkom) #2

It's likely to be the sincedb, check the file documentation for that setting and let us know if you have further problems :slight_smile:


(Sjaak) #3

Thanks but could you give me some more info? I've checked the doc page but I'm not sure how this could cause logstash/elastic not to read the single file I have. Its a static file that is not changing.


(Mark Walkom) #4

https://www.elastic.co/guide/en/logstash/5.5/plugins-inputs-file.html#_tracking_of_current_position_in_watched_files

Basically, if you have already processed the file before then Logstash tracks where it read to. So if this is a file that never changes, the sincedb tells Logstash to skip everything so it appears to not process anything.


(Sjaak) #5

I don't think Logstash ever processed the file because there is no index being created, that is the problem.

I tried renaming the file to avoid the sincedb issue but that doesn't make a difference.

The code I posted should work in theory? The log file is a plain text file with multiple lines.


(Mark Walkom) #6

Everything looks ok.

Try explicitly setting the sincedb to /dev/null and try again.


(Sjaak) #7

No luck. Doesn't appear that file is being written to either.

Is there any way to check whether logstash is looking at the file? In the plain.log I can see its opening ports for other configs etc. but I don't see anything related to reading the file off disk.


(Sjaak) #8

Nevermind


(Mark Walkom) #9

Nevermind what?


(Magnus Bäck) #10

The code I posted should work in theory?

No. Since you haven't set start_position => "beginning" it'll never read the file from the top.


(Sjaak) #11

Got something I was testing working right after asking a question.

@magnusbaeck I tried that but it didn't work. Probably because I'm doing something else wrong.

I did somehow get the data into elastic though. Not sure how so I'll be figuring that out next.

I also have a question about the best way to structure the data. Right now everything goes into the message field which doesn't look pretty.

The data I have looks something like this:

+$GPRMC,somevalue
+HDG: $HEHDT,somevalue
+POS: somevalue
>*** WARNING: somewarning1
+WARNING: somewarning2
>ERROR: someerror1
+*** ERROR: someerror2

I suppose I'll have to write a custom grok filter for each field and value I want to send to Elastic?
Is it possible to write grok filters that ignore things such as *** so that an error is written to the error field regardless of the original data showing >ERROR or +***ERROR?

How can I best handle the timestamp? Logs are created hourly and named something like 08311700, 08311800 etc. There are no timestamps inside the log and the time at which the log gets ingested by logstash and sent to elastic might be totally different from the filename timestamp.

Can logstash handle separate files and how would logstand and elastic handle the timestamp? In same cases older logfiles might get ingested after newer logfiles as well but that needs to be reflected in the data so logs don't get mixed up.

I'm sorry for all the questions, somebody smarter than me should be on this but alas there isn't.


(Magnus Bäck) #12

I tried that but it didn’t work. Probably because I’m doing something else wrong.

start_position only matters for new files. If Logstash has seen the file before (according to the sincedb file) it'll start from the old position (probably the end of the file).

I suppose I’ll have to write a custom grok filter for each field and value I want to send to Elastic?

Yes.

Is it possible to write grok filters that ignore things such as *** so that an error is written to the error field regardless of the original data showing >ERROR or +***ERROR?

Of course.

How can I best handle the timestamp? Logs are created hourly and named something like 08311700, 08311800 etc. There are no timestamps inside the log and the time at which the log gets ingested by logstash and sent to elastic might be totally different from the filename timestamp.

You'll find the filename in one of the fields (path, I believe) so you can use grok to extract the timestamp from there.

Can logstash handle separate files and how would logstand and elastic handle the timestamp? In same cases older logfiles might get ingested after newer logfiles as well but that needs to be reflected in the data so logs don’t get mixed up.

If you set the timestamp correctly you'll be fine.


(Sjaak) #13

Thanks magnus, its always very reassuring to hear from you :smiley:

I'll see what I can make of it. Tried some .csv logs yesterday and those are very easy to process, only took me 6 hours to get it working almost perfectly.


(system) #14

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.