Performance degradation with index size ~2Gb, ~6M docs

Hi all,
I have been evaluating elasticsearch during the last days, and I must
say I find it a great product - i wished most of the apps out there
were as easy to understand and integrate...

The performance was also great in my test app (500 reqs per second)
until the index got fat, and it started to decrease, reaching a
maximum of 30 reqs per second.
At the beginning I thought I had a problem with my setup or
environment, so I threw everything away, and started from scratch to
test a simple scenario:

I have 1 index with the default ES (0.18.6) settings: 5 shards, 1
replica, running in 1 machine (MacOSx Lion, 2 cores, 2.93Ghz, 4Gb).
Index is stored in file system (no NFS, no nothing)

The documents I insert cannot be simpler:

{"dummyMessage" : "
" }

The search query is also quite easy:

{"query": {
"query_string": {
"default_field": "dummyMessage",
"query": ""
}}}

When the index was smaller than 1Gb (around 4 million docs), i was
getting up to 500 requests per second with no problem (both indexing
and searching).

When the index got a bit bigger (1.7Gb. 6.5 million docs), search
performance degraded a lot, going down to 30 requests per second (I
must say that indexed performance is as good as before, it has not
decreased).
Interesting enough, only few CPU is being used while doing the search
test (8%), and, even if I give ES a lot of RAM, just a few is getting
used (heap_commited: 2.4GB, heap_used:423mb).

A thread dump showed me that most of the threads (233 out of the 470
total) are RUNNABLE on sun.nio.ch.FileDispatcher.pread0:

java.lang.Thread.State: RUNNABLE
at sun.nio.ch.FileDispatcher.pread0(Native Method)
at sun.nio.ch.FileDispatcher.pread(FileDispatcher.java:31)
at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:195)
at sun.nio.ch.IOUtil.read(IOUtil.java:171)
at sun.nio.ch.FileChannelImpl.read(FileChannelImpl.java:612)
at org.apache.lucene.store.NIOFSDirectory
$NIOFSIndexInput.readInternal(NIOFSDirectory.java:162)
at
org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:
229)
at
org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:
39)

The more or less rest of them (215) are TIMED_WAITING in the
ThreadPoolExecutor.

After everything is cached, running the test again gives me the
expected performance (500 reqs. per second), until I search with new
dictionary words.

My question is also very simple: having an index like that of 1.7Gb
size, and 6.5 million documents, is it ok that the max. performance i
can get in my machine is around 30 requests per second? or should I
try further tuning ES?

Any answer (comment) will be really appreciated. Thanks for the great
job!

PS: Sorry if this question has already been answered. I've indeed
searched the mailing list for something similar, but what I founded
was not exactly what I was looking for:

https://groups.google.com/group/elasticsearch/browse_thread/thread/18dfe6ace37c58f8/a090963b46043123
https://groups.google.com/group/elasticsearch/browse_thread/thread/9580b0cf6f527c76/06f4f01fd34d96e4

Heya, here are a few notes:

  1. You are using OSX, make sure you disable spotlight on the location where
    the index is stored. Also, alfred, quicksilver...

  2. How do you do load testing? I helped a fellow that used ruby to index
    data into elasticsearch, opening a socket for each index request, which the
    os ended up throttling. Make sure to try and use persistent connections.
    The node stats API has info on how many connections are opened and the
    total opened.

  3. You use an index with 5 shards. When you search, even on a single
    machine, that single search ends up executing in parallel on all shards.
    So, 1 search request -> 5 concurrent shard level requests. 50 concurrent
    search requests -> 250 concurrent shard level requests. On a machine with 2
    cores (which drive?). How many concurrent search requests are you running?
    Things to do:
    -> For single node test, use less shards.
    -> Bound the search thread pool with the maximum number of concurrent
    requests allowed, as your machine might be taxed to a point where it falls
    over. See more here:
    Elasticsearch Platform — Find real-time answers at scale | Elastic.

  4. A bigger index does mean more data to search on, but it really depends
    on what you search on and whats the term distribution. You can have a
    randomDictionaryWord be random from 20 words, and then really each term
    will be "heavily loaded".

On Tue, Jan 3, 2012 at 4:55 PM, Ruben ruben_inoto@yahoo.com wrote:

Hi all,
I have been evaluating elasticsearch during the last days, and I must
say I find it a great product - i wished most of the apps out there
were as easy to understand and integrate...

The performance was also great in my test app (500 reqs per second)
until the index got fat, and it started to decrease, reaching a
maximum of 30 reqs per second.
At the beginning I thought I had a problem with my setup or
environment, so I threw everything away, and started from scratch to
test a simple scenario:

I have 1 index with the default ES (0.18.6) settings: 5 shards, 1
replica, running in 1 machine (MacOSx Lion, 2 cores, 2.93Ghz, 4Gb).
Index is stored in file system (no NFS, no nothing)

The documents I insert cannot be simpler:

{"dummyMessage" : "
" }

The search query is also quite easy:

{"query": {
"query_string": {
"default_field": "dummyMessage",
"query": ""
}}}

When the index was smaller than 1Gb (around 4 million docs), i was
getting up to 500 requests per second with no problem (both indexing
and searching).

When the index got a bit bigger (1.7Gb. 6.5 million docs), search
performance degraded a lot, going down to 30 requests per second (I
must say that indexed performance is as good as before, it has not
decreased).
Interesting enough, only few CPU is being used while doing the search
test (8%), and, even if I give ES a lot of RAM, just a few is getting
used (heap_commited: 2.4GB, heap_used:423mb).

A thread dump showed me that most of the threads (233 out of the 470
total) are RUNNABLE on sun.nio.ch.FileDispatcher.pread0:

java.lang.Thread.State: RUNNABLE
at sun.nio.ch.FileDispatcher.pread0(Native Method)
at sun.nio.ch.FileDispatcher.pread(FileDispatcher.java:31)
at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:195)
at sun.nio.ch.IOUtil.read(IOUtil.java:171)
at sun.nio.ch.FileChannelImpl.read(FileChannelImpl.java:612)
at org.apache.lucene.store.NIOFSDirectory
$NIOFSIndexInput.readInternal(NIOFSDirectory.java:162)
at
org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:
229)
at

org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:
39)

The more or less rest of them (215) are TIMED_WAITING in the
ThreadPoolExecutor.

After everything is cached, running the test again gives me the
expected performance (500 reqs. per second), until I search with new
dictionary words.

My question is also very simple: having an index like that of 1.7Gb
size, and 6.5 million documents, is it ok that the max. performance i
can get in my machine is around 30 requests per second? or should I
try further tuning ES?

Any answer (comment) will be really appreciated. Thanks for the great
job!

PS: Sorry if this question has already been answered. I've indeed
searched the mailing list for something similar, but what I founded
was not exactly what I was looking for:

https://groups.google.com/group/elasticsearch/browse_thread/thread/18dfe6ace37c58f8/a090963b46043123

https://groups.google.com/group/elasticsearch/browse_thread/thread/9580b0cf6f527c76/06f4f01fd34d96e4