Ingesting large number of files in a directory using logstash

Hi,

I have a logstash configuration to ingest large number of json files from a directory.
The ingestion works perfectly fine, But for a particular number of json files.

For example i pointed the input for a folder with 400000 json files and it gets ingested without a problem. But when I point the input for a folder with 1000000 json files, it does not ingest.

Any idea of why this could happen ?

What does your input configuration look like? What version are you using?

There are issue related to large numbers of files, some of which have been fixed (here and here).

Hi Badger,
My input looks as follows,

input {
 file {
 path => "/home/ubuntu/filesPath/*.json"
 sincedb_path => "/dev/null"
 codec => json
 mode => read
 file_chunk_size => 131072
 }
}

and the version is logstash 7.10.2

I believe 7.10.2 would have a 4.2.x file plugin. You could verify using

cd /usr/share/logstash; bin/logstash-plugin list --verbose logstash-input-file

I suggest you point logstash at a folder with a million json files, wait a while, then get a thread dump (kill -3, or jstack, or whatever tool you prefer) and what the runnable threads are doing. If you are unable to interpret the thread dump then post a gist or put it on on some other site where I can view the text and I will see if I can tell what it is doing.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.