Logstash open_file handle issue For Linux

I'm facing until recently a very peculiar warning on the logstash log file [filewatch.tailmode.handlers.createinitial] open_file OPEN_WARN_INTERVAL is '300', i've searched around the web for every possible help to dig it myself but could not found any thread providing a solution in turn i though to post it here with complete details:

One Link from google another from discuss.elastic.co and
anothe_one and few others.

LOG FILE: /var/log/logstash/logstash-plain.log

[2019-02-18T04:04:07,946][WARN ][filewatch.tailmode.handlers.createinitial] open_file OPEN_WARN_INTERVAL is '300'
[2019-02-18T04:04:07,946][WARN ][filewatch.tailmode.handlers.createinitial] open_file OPEN_WARN_INTERVAL is '300'

AND STATUS:

[root@sj-logstash ~]# systemctl status logstash -l
â—Ź logstash.service - logstash
   Loaded: loaded (/etc/systemd/system/logstash.service; enabled; vendor preset: disabled)
  Drop-In: /etc/systemd/system/logstash.service.d
           └─logstashlimit.conf
   Active: active (running) since Sun 2019-02-17 11:10:07 PST; 16h ago
 Main PID: 24558 (java)
   CGroup: /system.slice/logstash.service
           └─24558 /bin/java -Xms1g -Xmx1g -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly -Djava.awt.headless=true -Dfile.encoding=UTF-8 -Djruby.compile.invokedynamic=true -Djruby.jit.threshold=0 -XX:+HeapDumpOnOutOfMemoryError -Djava.security.egd=file:/dev/urandom -cp /usr/share/logstash/logstash-core/lib/jars/animal-sniffer-annotations-1.14.jar:/usr/share/logstash/logstash-core/lib/jars/commons-codec-1.11.jar:/usr/share/logstash/logstash-core/lib/jars/commons-compiler-3.0.8.jar:/usr/share/logstash/logstash-core/lib/jars/error_prone_annotations-2.0.18.jar:/usr/share/logstash/logstash-core/lib/jars/google-java-format-1.1.jar:/usr/share/logstash/logstash-core/lib/jars/gradle-license-report-0.7.1.jar:/usr/share/logstash/logstash-core/lib/jars/guava-22.0.jar:/usr/share/logstash/logstash-core/lib/jars/j2objc-annotations-1.1.jar:/usr/share/logstash/logstash-core/lib/jars/jackson-annotations-2.9.5.jar:/usr/share/logstash/logstash-core/lib/jars/jackson-core-2.9.5.jar:/usr/share/logstash/logstash-core/lib/jars/jackson-databind-2.9.5.jar:/usr/share/logstash/logstash-core/lib/jars/jackson-dataformat-cbor-2.9.5.jar:/usr/share/logstash/logstash-core/lib/jars/janino-3.0.8.jar:/usr/share/logstash/logstash-core/lib/jars/jruby-complete-9.1.13.0.jar:/usr/share/logstash/logstash-core/lib/jars/jsr305-1.3.9.jar:/usr/share/logstash/logstash-core/lib/jars/log4j-api-2.9.1.jar:/usr/share/logstash/logstash-core/lib/jars/log4j-core-2.9.1.jar:/usr/share/logstash/logstash-core/lib/jars/log4j-slf4j-impl-2.9.1.jar:/usr/share/logstash/logstash-core/lib/jars/logstash-core.jar:/usr/share/logstash/logstash-core/lib/jars/org.eclipse.core.commands-3.6.0.jar:/usr/share/logstash/logstash-core/lib/jars/org.eclipse.core.contenttype-3.4.100.jar:/usr/share/logstash/logstash-core/lib/jars/org.eclipse.core.expressions-3.4.300.jar:/usr/share/logstash/logstash-core/lib/jars/org.eclipse.core.filesystem-1.3.100.jar:/usr/share/logstash/logstash-core/lib/jars/org.eclipse.core.jobs-3.5.100.jar:/usr/share/logstash/logstash-core/lib/jars/org.eclipse.core.resources-3.7.100.jar:/usr/share/logstash/logstash-core/lib/jars/org.eclipse.core.runtime-3.7.0.jar:/usr/share/logstash/logstash-core/lib/jars/org.eclipse.equinox.app-1.3.100.jar:/usr/share/logstash/logstash-core/lib/jars/org.eclipse.equinox.common-3.6.0.jar:/usr/share/logstash/logstash-core/lib/jars/org.eclipse.equinox.preferences-3.4.1.jar:/usr/share/logstash/logstash-core/lib/jars/org.eclipse.equinox.registry-3.5.101.jar:/usr/share/logstash/logstash-core/lib/jars/org.eclipse.jdt.core-3.10.0.jar:/usr/share/logstash/logstash-core/lib/jars/org.eclipse.osgi-3.7.1.jar:/usr/share/logstash/logstash-core/lib/jars/org.eclipse.text-3.5.101.jar:/usr/share/logstash/logstash-core/lib/jars/slf4j-api-1.7.25.jar org.logstash.Logstash --path.settings /etc/logstash

Feb 18 04:01:40 sj-logstash logstash[24558]: [2019-02-18T04:01:40,279][WARN ][filewatch.tailmode.handlers.createinitial] open_file OPEN_WARN_INTERVAL is '300'
Feb 18 04:01:40 sj-logstash logstash[24558]: [2019-02-18T04:01:40,279][WARN ][filewatch.tailmode.handlers.createinitial] open_file OPEN_WARN_INTERVAL is '300'

LOGSTASH OPEN FILE SETTING:

As i'm using RHEL 7.6, i have Created below file in order to set the Open file limit for logstash ..

[root@sj-logstash ~]# cat /etc/systemd/system/logstash.service.d/logstashlimit.conf
[Service]
LimitNOFILE=65535 

Below File i saw already there i have just adjusted the limit:

[root@sj-logstash ~]# cat /etc/systemd/system/logstash.service
[Unit]
Description=logstash

[Service]
Type=simple
User=logstash
Group=logstash
# Load env vars from /etc/default/ and /etc/sysconfig/ if they exist.
# Prefixing the path with '-' makes it try to load, but if the file doesn't
# exist, it continues onward.
EnvironmentFile=-/etc/default/logstash
EnvironmentFile=-/etc/sysconfig/logstash
ExecStart=/usr/share/logstash/bin/logstash "--path.settings" "/etc/logstash"
Restart=always
WorkingDirectory=/
Nice=19
LimitNOFILE=65535 

[Install]
WantedBy=multi-user.target

SYSTEM LEVEL File Descriptor settings:

[root@sj-logstash ~]# cat /etc/sysctl.conf
net.ipv6.conf.all.disable_ipv6 = 1
net.ipv6.conf.default.disable_ipv6 = 1
vm.max_map_count = 262144


[root@sj-logstash ~]# cat /etc/security/limits.conf 

*               soft    nofile 1024
*               hard    nofile 65535 
root            soft    nofile 1024
root            hard    nofile 65535 

LOGSTASH JVM Settings:

[root@elasticS0104 logstash]# cat /etc/logstash/jvm.options
-Xms7g
-Xmx7g
-XX:+UseParNewGC
-XX:+UseConcMarkSweepGC
-XX:CMSInitiatingOccupancyFraction=75
-XX:+UseCMSInitiatingOccupancyOnly
-Djava.awt.headless=true
-Dfile.encoding=UTF-8
-Djruby.compile.invokedynamic=true
-Djruby.jit.threshold=0
-XX:+HeapDumpOnOutOfMemoryError
-Djava.security.egd=file:/dev/urandom

LOGSTASH INPUT Configurations:

[root@sj-logstash ~]# cd  /etc/logstash/conf.d/
[root@sj-logstash conf.d]# ls -l
total 8
-rw-r--r-- 1 root root  917 Feb 17 00:46 syslog.conf
-rw-r--r-- 1 root root 1003 Feb  7 02:14 rmlogs.conf

Logstash Version : 6.5.4

For post completion adding below lines..

Logstash config for syslog:

[root@sj-logstash conf.d]# cat dpc-syslog.conf
input {
  file {
    path => [ "/data_elk/SYSTEMS/*.log" ]
    start_position => beginning
    sincedb_path => "/dev/null"
    max_open_files =>  65535 
    type => "dpc-syslog"
  }
}

filter {
  if [type] == "dpc-syslog" {
    grok {
      match => { "message" => "%{SYSLOGTIMESTAMP:syslog_timestamp } %{SYSLOGHOST:syslog_hostname} %{DATA:syslog_program}(?:\[%{POSINT:syslog_pid}\])?: %{GREEDYDATA:syslog_message}" }
      add_field => [ "received_at", "%{@timestamp}" ]
      remove_field => [ "@version", "host", "path", "messages" ]
    }
    syslog_pri { }
    date {
      match => [ "syslog_timestamp", "MMM  d HH:mm:ss", "MMM dd HH:mm:ss" ]
 }
}
}
output {
        if [type] == "dpc-syslog" {
        elasticsearch {
                hosts => "elasticS0101:9200"
                manage_template => false
                index => "dpc-syslog-%{+YYYY.MM.dd}"
                document_type => "messages"
  }
 }
}

Logstash config for other custom logs:

[root@sj-logstash conf.d]# cat rmlogs.conf
input {
  file {
    path => [ "/data_elk/rmlogs/*.txt" ]
    start_position => beginning
    sincedb_path => "/dev/null"
    max_open_files => 65535 
    type => "rmlog"
  }
}

filter {
  if [type] == "rmlog" {
    grok {
     match => { "message" => "%{HOSTNAME:Hostname},%{DATE:Date},%{HOUR:dt_h}:%{MINUTE:dt_m},%{NUMBER:duration}-%{WORD:hm},%{USER:User},%{USER:User_1} %{NUMBER:Pid} %{NUMBER:float} %{NUMBER:float} %{NUMBER:Num_1} %{NUMBER:Num_2} %{DATA} (?:%{HOUR:dt_h1}:|)(?:%{MINUTE:dt_m1}|) (?:%{HOUR:dt_h2}:|)(?:%{MINUTE:dt_m2}|)%{GREEDYDATA:CMD},%{GREEDYDATA:PWD_PATH}" }
      add_field => [ "received_at", "%{@timestamp}" ]
      remove_field => [ "path","minute3","minute2","host","hour2","hour3","Num_1","Num_2","message" ]
      remove_tag => ["_grokparsefailure"]
   }
 }
}

output {
        if [type] == "rmlog" {
        elasticsearch {
                hosts => "elasticS0101:9200"
                manage_template => false
                index => "dpc-rmlog-%{+YYYY.MM.dd}"
  }
 }
}

LOGSTASH logstash.yml file:

# cat /etc/logstash/logstash.yml | grep -v "#"
path.data: /var/lib/logstash


path.logs: /var/log/logstash

Correct way to Increase the max_open_files

I would Like to know to increase the max_open_files for Logstash.

Would appreciate any help.

So a couple of things I notice in your config.

LimitNOFILE=356000

This is set way to high. The max limit is 65535 but if you set it to the max you leave no room of the OS. Please have read here about system settings.

[root@sj-logstash ~]# cat /etc/security/limits.conf 

*               soft    nofile 64000
*               hard    nofile 356000
root            soft    nofile 64000
root            hard    nofile 356000

Again, settings are way to high. Please lower them to a more normal value.

input {
  file {
    path => [ "/data_elk/SYSTEMS/*.log" ]
    start_position => beginning
    sincedb_path => "/dev/null"
    max_open_files => 64000
    type => "dpc-syslog"
  }
}

Max open files is per default 4095 and should be more then enough. Have a read here about max_open_files

Once you updated you config and rebooted the machine, see if you still have those warnings.

@pjanzen, thanks for your revert, However, at first i applied the Following setting as per Unix/Linux standard but still did not helped:

  1. Edit the /etc/security/limits.conf and add the lines:

    *  soft nofile 1024 
    *  hard nofile 65535
    
  2. Edit the /etc/pam.d/login by adding the line:

    session required /lib/security/pam_limits.so
    
  3. Use the system file limit to increase the file descriptor limit to 65535.
    The system file limit is set in /proc/sys/fs/file-max .

     echo  65535 >  /proc/sys/fs/file-max
    
  4. Use the ulimit command to set the file descriptor limit to the hard limit specified in /etc/security/limits.conf.

    ulimit -n unlimited
    
  5. Restart your system.

I would like to know the actual meaning of the error code:

[2019-02-18T04:04:07,946][WARN ][filewatch.tailmode.handlers.createinitial] open_file OPEN_WARN_INTERVAL is '300'

What else is running on that box? Although it is a warning it looks like it runs short on resources. Reading other theads it also could point to permission issues. Is the logstash user allowed to acccess those files?

There is rsyslog service is running but that's Just receiving clint logs not not consuming high resources its at very optimise level and yes logstash is the owner of the files it is processing.

Would you mind sharing the link where it say permission issue, though here its not a issue.

I read that on this link: Filewatch.tailmode.handlers.createinitial open_file OPEN_WARN_INTERVAL is 300

This link i visited many times , But i don't see pointing any permission issue on this thread and i don't see any answer on this question. Secondly, there is no concrete solution to Increase max file limit for Logstash in the document where its may say these are the location of setting to keep increase open_file_limit.

I expecting If Someone From an esteem Elastic Team can provide a insight that will be glad to see the on the thread that will enable everyone to know the solution of this problem.

logstash failed to open a file. There are many possible reasons for that. I suggest you use strace (or whatever the equivalent is on your OS) to trace the system calls made by logstash and see exactly why it failed.

@Badger, thanks for your reply, i tried all possible log verbosity to get the in depth of the issue like strace you mentioned and even increasing the log verbosity for logstash but only get this line nothing headway or pointer to get more into it, I'm curious to know if there is a way to increase max_open_file limit in Linux OS version 7 (RHEL7 or Centos7) & that's belongs to systemd services, i implied the mentioned in document but no gain.

What makes you think you have reached the limit on the number of open files?

No matter what level of logging you enable, logstash will not log why it failed. You need an external tool like strace to tell you that.

This is what i'm getting in logstash logs, what makes me to think about it ... and that's true i need starce and kind of other native or other tools to get in depth but as i said earlier i have tried and did not get anything which may provide a pointer..

[2019-02-18T04:04:07,946][WARN ][filewatch.tailmode.handlers.createinitial] open_file OPEN_WARN_INTERVAL is '300'
[2019-02-18T04:04:07,946][WARN ][filewatch.tailmode.handlers.createinitial] open_file OPEN_WARN_INTERVAL is '300'

The man page for the open(2) system call lists 24 different errors that it can return. Now not all of those could happen to logstash, but assuming that open returned EMFILE is not a good idea.

That's true, but interesting thing is it only appears in logstash logs not on the system logs or elsewhere.

Any response of guidelines will be helpful on this.

We discussed this on a Logstash issue. I'm linking comments here for future readers.
I explain the warning message here

So it is possible to process tens of thousands of files with a max_open_files as low as 10 even because the close_older interval controls when one of those 10 files is "closed" to allow one of the other (not 10) files to become active. I explain more about the various states a file can be in in the internal state machine here
How this works in practice and what value to use for close_older is dependent on the append frequency of files. Some files with a high append frequency may never be "closed" as a change is being detected before the close_older interval is exceeded (it is a countdown timer that is reset each time a change in size is detected) - this is a good thing generally because by not processing a high append frequency file a backlog is being built up.
If there is a broad mix of append frequencies then the lower frequency files can "close" - a smaller backlog is more tolerable.
Therefore, if you get log info every 5 seconds telling you that "the max_open_files is exceeded and there are N files waiting to open" then the close_older setting is not low enough.

1 Like

@guyboertje, thanks for all the information and expert advice provided, However if i consider a case where 4000 files are updating every 10 secs then what would be the possible value i should look for close_older just an example consideration, However is there any thumb rule to follows for the given case.

Would you please provide advice to correct the logstash configuration in my case to make to more suitable and optimised. Let me know if you see anything bad on my config.

Thanks again for more explicit and explained answer.

The close_older value to choose is also affected by the stat_interval.
stat_interval is a loop delay mechanism to prevent querying files for file changes too often.
The loop processes all files in each state machine state:
(pseudocode)

until quit?
  iterate_on_state
  write_sincedb
  if time_to_discover
    discover
  end
  sleep_for stat_interval
end

and "iterate_on_state" is:
(pseudocode)

if any_discovered_files?
  begin
    process_all_states(watched_files)
  ensure
    remove_all_deleted_files_from_discovered_files_collection
  end
end

and "process_all_states" is:
(pseudocode)

process_closed(watched_files)
process_ignored(watched_files)
process_delayed_delete(watched_files)
process_restat_for_watched_and_active(watched_files)
process_rotation_in_progress(watched_files)
process_watched(watched_files)
process_active(watched_files)

It is in the process_active method where each file is read and closed if closable (close_older timer expired).

Remembering that the input is single threaded, it is important to understand the file reading process. For the sake of argument say we have max_open_files set to 10. If in those 10 files a lot of new content was appended since the last loop iteration then all that data will be read and sent to the filter section (if the workers are too busy to accept the read events then the thread pauses, building yet more backlog) before closable is checked and the file can be closed. With 10 max open, the loop loops frequently (due to lower volumes of data needing processing) so another file can be seen in the "window" of 10 quite quickly. With 50000 max open, the loop loops slowly as the overall volume of data read and processed is very high and it may be a long time before a waiting file is eventually opened even with a minimal closer_older value (e.g. 1 second). What is more, if a change in a file closed in the previous loop is detected by the process_closed it might be reopened immediately instead of a file that that has been waiting longer - due to the sort order of the discovered files.
In these cases where there is a lot of content to read stat_interval can be very low (at the expense of higher CPU) e.g. "100ms", max_open_files can be low-ish e.g. 100, close_older can be "2s" and discover_interval can be high e.g. 1000 as it is a loop count value not a time.

Another option, to get more determinism, is to use multiple file inputs each processing a mixed append frequency subset of all the files. Each subset can have better tuned settings.

1 Like

@guyboertje thanks again for the detailed answer and explaining with the pseudocode and details of the attributes / values like closer_older and stat_interval .

However, i Would appreciate any example values to be set for these attributes/values based on the number of files (ie 5k) considering for the below input section.

input {
  file {
    path => [ "/data_elk/SYSTEMS/*.log" ]
    start_position => beginning
    sincedb_path => "/dev/null"
    max_open_files =>  65535 
    close_older  =>    ?
    stat_interval  =>   ?
    type => "dpc-syslog"
  }
}