I'm using Apache Camel to index data from a Enterprise Service. I'm testing it out on a local elasticsearch instance. It appears to index about 12000 files then starts giving me a Too many files error. I set my ulimit to 64000 and some other settings:
This got rid of the too many open files issue, but I'm now running into a Socket Closed issue. I'm not sure if these problems will go away when I'm running multiple nodes in a hosted environment instead of my local machine.
Once it indexes those files, does it release them or are those files always open for the life of the elasticsearch process?
Edit: I just ran my camel Job. lsof | wc -l seemed to top out around 15531 and it was able to index around 13962 document, before the sockets closed and the camel job failed. I'm not too familar with sockets and I'm not sure how to increase my socket availablity or release sockets. Do I need to throttle my Camel job so it indexes more slowly?
I suspect your Camel job is somehow holding the files open. ES, with non-bulk and bulk index, would never open so many file handles for so few documents.
lsof will tell you which file names (or sockets) are held open and which processes have them open.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.