Many number of open files


(arta) #1

Hi,
In my experiment, I indexed about 80 million documents and I observed the number of open files being high.
My setup is as follows:

  • number of nodes in the cluster: 2
  • number of indices: 32
  • number of shards (for each index): 32
  • number of replicas: 1
    The number of open files observed were:
    node1: 47508
    node2: 47412

The number of open files was increasing as time went on.
(I eventually had out of memory exception, so I had to stop my experiment there.)

With my setup, it creates 32 indices x 32 shards x 2 replicas / 2 nodes = 1024 Lucene indices per node.
I know the number of shards is too big for this environment, but this was an experiment.

Is about 50,000 open files for about 1000 Lucene indices as expected?

Is the number somewhat proportional to number of Lucene indices?
My observation is that the number of files in each Lucene index directory varies (100 to 200).
I think, correct me if I'm wrong, the number increase as more documents are indexed and decreases when optimization runs.
In my current system setup, I set the max limit of number of open files per process to 100,000.
Can I say this is ok setup for about 1000 Lucene indices?

Thanks for your help.


(Shay Banon) #2

The more shards (Luceen index) you have will increase the number of open
files required. The number of open files each Lucene index needs depends
internally on the number of segments it has, which depend in turn on the
merge policy settings.

On Wed, Jun 13, 2012 at 8:47 PM, arta artasano@sbcglobal.net wrote:

Hi,
In my experiment, I indexed about 80 million documents and I observed the
number of open files being high.
My setup is as follows:

  • number of nodes in the cluster: 2
  • number of indices: 32
  • number of shards (for each index): 32
  • number of replicas: 1
    The number of open files observed were:
    node1: 47508
    node2: 47412

The number of open files was increasing as time went on.
(I eventually had out of memory exception, so I had to stop my experiment
there.)

With my setup, it creates 32 indices x 32 shards x 2 replicas / 2 nodes =
1024 Lucene indices per node.
I know the number of shards is too big for this environment, but this was
an
experiment.

Is about 50,000 open files for about 1000 Lucene indices as expected?

Is the number somewhat proportional to number of Lucene indices?
My observation is that the number of files in each Lucene index directory
varies (100 to 200).
I think, correct me if I'm wrong, the number increase as more documents are
indexed and decreases when optimization runs.
In my current system setup, I set the max limit of number of open files per
process to 100,000.
Can I say this is ok setup for about 1000 Lucene indices?

Thanks for your help.

--
View this message in context:
http://elasticsearch-users.115913.n3.nabble.com/Many-number-of-open-files-tp4019243.html
Sent from the ElasticSearch Users mailing list archive at Nabble.com.


(system) #3