File input, from NetApp CIFS share, not reading single file

eclipsed450 · August 15, 2019, 8:23pm

Hi,
I've made quite a few other posts, and have successfully ingested more than 1 billion messages in the last couple weeks, but we've come to the conclusion that Logstash just isn't keeping up with the backlog of messages. The reason I say this is that Logstash will seemingly lose its place in the ingestion, and start over, randomly.

What we have is NetApp CIFS audit logs turned on, and writing to a CIFS share on itself (not in the audit logs). There are 10,001 files in this directory - a single current file, and 10,000 previous files; each 100MB in size; or approximately 77,000 lines. I had the input configured to /mnt/cifs/*.xml, and this was working, for the most part like I said above - hundreds of millions of messages have been ingested so far. But since it wasn't catching up, I opted to change the input to the single current file. I've tried different configurations of start_position => "beginning", sincedb_path => "/dev/null", stat_interval => 1, mode => read, and others, and it seems that Logstash will read the file when it starts, and only that one time, and never again, no matter what, until I restart Logstash.

I've read this page, but I'm not seeing where a specific problem is. Can someone break this down for me? Thank you in advance.

eclipsed450 · August 15, 2019, 9:43pm

@badger since you've been so helpful so far

Badger · August 16, 2019, 7:50pm

sincedb_path => "/dev/null" does not prevent the file input managing the sincedb. It prevents it persisting that db across restarts.

Why do expect it to get read more than once?

If you enable '--log.level trace' what does filewatch have to say?

eclipsed450 · August 16, 2019, 8:34pm

Okay, good to know, thank you.

It's not documented anywhere that I've found, but as I understand it, when the file gets to 100MB, it renames with a timestamp in the name, then starts a new latest file. Additionally, once Logstash has read that latest file on first startup, it doesn't read any lines that are added to it, even when it's that same file.

I'm sorry, where do I put this when I'm running Logstash as a service?

Badger · August 16, 2019, 8:43pm

You can set the log.level in logstash.yml

eclipsed450 · August 16, 2019, 9:21pm

If I run the single config via the command line, it seems to read the file as I expect it to; however if I put that config parameter in the yml and restart the service, it will read everything inthe current file once, then stop reading any new messages. In both, I get the following:

[DEBUG] 2019-08-16 13:54:58.339 [pool-3-thread-2] jvm - collector name {:name=>"ParNew"}
[DEBUG] 2019-08-16 13:54:58.339 [pool-3-thread-2] jvm - collector name {:name=>"ConcurrentMarkSweep"}
[TRACE] 2019-08-16 13:54:58.595 [[main]<file] processor - Delayed Delete processing
[TRACE] 2019-08-16 13:54:58.595 [[main]<file] processor - Watched + Active restat processing
[TRACE] 2019-08-16 13:54:58.598 [[main]<file] processor - Rotation In Progress processing
[TRACE] 2019-08-16 13:54:58.598 [[main]<file] processor - Watched processing
[TRACE] 2019-08-16 13:54:58.600 [[main]<file] processor - Active - no change {"watched_file"=>"<FileWatch::WatchedFile: @filename='audit_last.xml', @state='active', @recent_states='[:watched, :watched]', @bytes_read='33244652', @bytes_unread='0', current_size='33244652', last_stat_size='33244652', file_open?='true', @initial=false, @sincedb_key='294976 0 42'>"}

and it looks like the current_size and last_stat_size don't change.

Badger · August 17, 2019, 12:48pm

Does it work as expected with a local file?

eclipsed450 · August 17, 2019, 2:11pm

Even if it did, NetApp CIFS audit logs don't have an option to write to anything that wouldn't be a remote share to logstash. I could try to figure out something to sync the files from the CIFS share to a local filesystem, maybe, if this is determined to not work for me in this method.

eclipsed450 · August 19, 2019, 4:41pm

FYI, I tried using lsyncd to keep the single file in sync to the local system, but it didn't seem to be able to do that repeatedly. I'm going to try again with the rsync.ssh config, but am not optimistic. Also, when I checked this morning, the running config has been pulling in messages, but only once every hour-ish.

eclipsed450 · August 19, 2019, 6:23pm

I found this morning if I run watch ls /mnt/cifs_audit/, the config runs fine and ingests as I'd expect it to. Until I can figure out a better solution, I'm running a crontab to ls the directory on a loop, and it seems to be importing.

eclipsed450 · August 19, 2019, 9:25pm

It seems adding the close_older (we set it to 5) option we were looking for, and doesn't require the cron job or lsyncd.

system · September 16, 2019, 9:25pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
File Input from directory with 100K files Logstash	5	281	December 2, 2020
Logstash has suddenly stopped reading input files Logstash	8	930	September 15, 2020
Logstash "input" performance? Logstash	9	1809	July 6, 2017
Reading file from beginning using file input Logstash	6	3318	March 13, 2019
Logstash is only partially reading input files Logstash	5	1858	August 7, 2017

File input, from NetApp CIFS share, not reading single file

Related topics