Logstash failing with "ERROR::EMFILE- Too many open files"

Laddu_ps · July 16, 2019, 11:09am

Hi. I'm having issues running logstash withi large sets of data. I'm trying to push data to Elasticsearch via Logstash Where both the applications are hosted on CENTOS servers. I'm trying to send 1000 files per minute (each of size 2KB) for 1 hour. But I'm facing the following issue after sending 3992 (exactly) files(ie., in four minutes).

[2019-07-16T06:19:36,398][DEBUG][logstash.instrument.periodicpoller.cgroup] Error, cannot retrieve cgroups information {:exception=>"Errno::EMFILE", :message=>"Too many open files - Too many open files"

I've tried increasing the system limits too. I've increased the pipeline workers to 6 as well as the batch size to 1000.I Don't have any other idea of what else to change or whether I've made something wrong in the configuration. Thanks in advance.

Badger · July 16, 2019, 12:10pm

What does your input configuration look like?

Laddu_ps · July 16, 2019, 12:22pm

input {
  file {
    path => ["/home/stats/archive/*.csv"]
    start_position => "beginning"
    max_open_files => 40000
#    sincedb_path => "/tmp/.productsSince15.db"
  }
}
filter {
  csv {
    separator => ","
    skip_header => true
    columns => ["mitigation_id","time","total_in_packets","total_out_packets","total_dropped_packets","total_in_bytes","total_out_bytes","total_dropped_bytes","invalid_pkts_in_packets","invalid_pkts_out_packets","invalid_pkts_dropped_packets","invalid_pkts_in_bytes","invalid_pkts_out_bytes","invalid_pkts_dropped_bytes","acl_in_packets","acl_out_packets","acl_dropped_packets","acl_in_bytes","acl_out_bytes","acl_dropped_bytes","zombie_detection_in_packets","zombie_detection_out_packets","zombie_detection_dropped_packets","zombie_detection_in_bytes","zombie_detection_out_bytes","zombie_detection_dropped_bytes","geo_ip_in_packets","geo_ip_out_packets","geo_ip_dropped_packets","geo_ip_in_bytes","geo_ip_out_bytes","geo_ip_dropped_bytes","tcpsyn_in_packets","tcpsyn_out_packets","tcpsyn_dropped_packets","tcpsyn_in_bytes","tcpsyn_out_bytes","tcpsyn_dropped_bytes","payload_in_packets","payload_out_packets","payload_dropped_packets","payload_in_bytes","payload_out_bytes","payload_dropped_bytes", "http_mlfd_in_packets","http_mlfd_out_packets","http_mlfd_dropped_packets","http_mlfd_in_bytes","http_mlfd_out_bytes","http_mlfd_dropped_bytes","dns_mlfd_in_packets","dns_mlfd_out_packets","dns_mlfd_dropped_packets","dns_mlfd_in_bytes","dns_mlfd_out_bytes","dns_mlfd_dropped_bytes","traffic_management_in_packets","traffic_management_out_packets","traffic_management_dropped_packets","traffic_management_in_bytes","traffic_management_out_bytes","traffic_management_dropped_bytes"]
  }
  date {
    match => [ "time", "YYYY-MM-dd HH:mm:ss" ]
    target => "timestamp"
   } 
 
mutate {
    remove_field => [ "message" ]
}

mutate {
  convert => { 
               "mitigation_id" => "integer_eu"
               "total_in_packets" => "integer_eu"
               "total_out_packets" => "integer_eu"
               "total_dropped_packets" => "integer_eu"
               "total_in_bytes" => "integer_eu"
               "total_out_bytes" => "integer_eu"
               "total_dropped_bytes" => "integer_eu"
               "invalid_pkts_in_packets" => "integer_eu"
               "invalid_pkts_out_packets" => "integer_eu"
               "invalid_pkts_dropped_packets" => "integer_eu"
               "invalid_pkts_in_bytes" => "integer_eu"
               "invalid_pkts_out_bytes" => "integer_eu"
               "invalid_pkts_dropped_bytes" => "integer_eu"
               "acl_in_packets" => "integer_eu"
               "acl_out_packets" => "integer_eu"
               "acl_dropped_packets" => "integer_eu"
               "acl_in_bytes" => "integer_eu"
               "acl_out_bytes" => "integer_eu"
               "acl_dropped_bytes" => "integer_eu"
               "zombie_detection_in_packets" => "integer_eu"
               "zombie_detection_out_packets" => "integer_eu"
               "zombie_detection_dropped_packets" => "integer_eu"
               "zombie_detection_in_bytes" => "integer_eu"
               "zombie_detection_out_bytes" => "integer_eu"
               "zombie_detection_dropped_bytes" => "integer_eu"
               "geo_ip_in_packets" => "integer_eu"
               "geo_ip_out_packets" => "integer_eu"
               "geo_ip_dropped_packets" => "integer_eu"
               "geo_ip_in_bytes" => "integer_eu"
               "geo_ip_out_bytes" => "integer_eu"
               "geo_ip_dropped_bytes" => "integer_eu"
               "tcpsyn_in_packets" => "integer_eu"
               "tcpsyn_out_packets" => "integer_eu"
               "tcpsyn_dropped_packets" => "integer_eu"
               "tcpsyn_in_bytes" => "integer_eu"
               "tcpsyn_out_bytes" => "integer_eu"
               "tcpsyn_dropped_bytes" => "integer_eu"
               "payload_in_packets" => "integer_eu"
               "payload_out_packets" => "integer_eu"
               "payload_dropped_packets" => "integer_eu"
               "payload_in_bytes" => "integer_eu"
               "payload_out_bytes" => "integer_eu"
               "payload_dropped_bytes" => "integer_eu"
               "http_mlfd_in_packets" => "integer_eu"
               "http_mlfd_out_packets" => "integer_eu"
               "http_mlfd_dropped_packets" => "integer_eu"
               "http_mlfd_in_bytes" => "integer_eu"
               "http_mlfd_out_bytes" => "integer_eu"
               "http_mlfd_dropped_bytes" => "integer_eu"
               "dns_mlfd_in_packets" => "integer_eu"
               "dns_mlfd_out_packets" => "integer_eu"
               "dns_mlfd_dropped_packets" => "integer_eu"
               "dns_mlfd_in_bytes" => "integer_eu"
               "dns_mlfd_out_bytes" => "integer_eu"
               "dns_mlfd_dropped_bytes" => "integer_eu"
               "traffic_management_in_packets" => "integer_eu"
               "traffic_management_out_packets" => "integer_eu"
               "traffic_management_dropped_packets" => "integer_eu"
               "traffic_management_in_bytes" => "integer_eu"
               "traffic_management_out_bytes" => "integer_eu"
               "traffic_management_dropped_bytes" => "integer_eu"   
  }
}
}
output {
  elasticsearch {
    hosts => "http://localhost:9200"
    index => "typestats-%{+YYYY-MM-dd_HH_mm}"
  }
}

Laddu_ps · July 16, 2019, 12:23pm

I'm trying to index .csv files in elasticsearch via Logstash.

Badger · July 16, 2019, 12:31pm

It sounds like you have not increased the open file limit for the process, so I would look into that. Also, you might want to significantly lower the value of close_older on the file input, and also consider switching to read mode instead of tail mode.

Laddu_ps · July 18, 2019, 6:57pm

Hi, Sorry for the late reply. Thanks for the heads-up! I did the changes like you said

changing the mode to read mode
increasing the max open files limit (10000)
decreasing the close_older limit (3 minutes)
This setup was working fine for 4-5 hours. But again the Logstash stopped processing the incoming data after that. While there were 2,88,000 files and more, Logstash was able to process only 1,65,900 files and got stuck at it.
This is the debug logs

Is there anything I'm missing out?

Badger · July 18, 2019, 7:10pm

In tail mode I can see why you would want to tail a large number of files, but in read mode logstash is going to open a file, read the whole file, and process the whole file through the pipeline. That's basically going to keep 1 CPU busy. I would not set the open file limit much above the number of CPUs that you have. If you increase it further then if anything I would expect throughput to get worse.

Are you deleting files and then adding more files? That can result in inode reuse.

Laddu_ps · July 18, 2019, 7:23pm

There is still so many inodes unused.
Is it still necessary to delete/remove the files for inode reuse?

Laddu_ps · July 18, 2019, 7:24pm

Actually, I'm not deleting any files as of now.

Badger · July 18, 2019, 7:33pm

Then the next step would be '--log.level trace' (not debug) which will cause the filewatch code to log voluminously. There will be millions of lines of logs, but you just need to find the lines related to one file that does not get read.

Laddu_ps · July 18, 2019, 7:35pm

Will try that too and tell you how it works out. Thank you for your valuable time

Laddu_ps · July 23, 2019, 7:31am

I tried debugging the issue by enabling log level as "trace".

Is there any issue with SinceDBcollection?

The logs are shipped with so much delay. Initially 0.15 million log files are shipped without any issue, but whatever comes after that are taking so much time.

Totally 5 million files were created in 3 days but only 0.7 million files got shipped till now. The logstash didn't stop but it processed with much delay.

Badger · July 23, 2019, 1:16pm

The fact that you are seeing bytes_read = bytes_unread = 0 with a non-zero file size definitely suggests inode re-use to me.

Pick an inode, such as 1812511751, and search for every reference to it. I suspect you will find it being used for two different files.

Laddu_ps · July 24, 2019, 9:52am

If it's inode re-use, what is the suggested configuration?

Badger · July 24, 2019, 12:06pm

I do not have a good answer for that.

Laddu_ps · August 8, 2019, 5:39am

Thanks anyway!

system · September 5, 2019, 5:39am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Logstash too many open files error Logstash	1	917	June 21, 2018
Logstash error while runing conf file Logstash	4	437	April 14, 2017
Too many open files while running logstash Logstash	2	568	April 17, 2017
How to process large data with logstash Logstash	1	809	September 7, 2017
Again this issue : Reached open files limit: 4095, set by the 'max_open_files' option or default, files yet to open: 4298 Logstash	15	4065	June 18, 2019

Logstash failing with "ERROR::EMFILE- Too many open files"

Related topics