In memory index out of memory issues while indexing

Hi,

I'm currently evaluating ElasticSearch as a replacement for our
existing Solr search indices and initial tests have proved very
promising. However, our largest index is just shy of 30m documents and
naturally weighs in at quite a hefty size. Indexing this in ES using
the standard index storage type (mmapfs, I think) and a 10gb heap
(-Xmx10g) works fine but when querying the index I get maybe 3-4
requests/s compared to Solr's ~148 requests/s. If I then shut down the
ES server and start it with "-Des.index.store.type=memory" the number
of requests shoots up by almost a factor of 100 (~280-300 requests/s).
Next I cleared out the index and tried to re-index using an in-memory
index from the start, but doing it this way I get out of memory errors
half way through indexing (the full index size is ~20gb).

Am I doing something wrong here? How can the index work fine if I
index it and then shift to in-memory, but not directly as in-memory?

Any help appreciated,

Matt

Which gateway are you using? You can't shift to in-memory after a restart
unless you are using a shared gateway. If you are using the default local
gateway, then you will have an empty index.

By default, elasticsearch uses the nio based fs storage, not the mmap one.
Lucene has changed the default storage to be mmap if under Lucene and with
64bit, but I don't think its a good default.

Note regarding the memory store type, there have been reports of it not
working well, and causes failures when searching. I still need to chase that
one down.

On Thu, Jul 28, 2011 at 1:09 PM, Matt Wilson codebrewery@gmail.com wrote:

Hi,

I'm currently evaluating Elasticsearch as a replacement for our
existing Solr search indices and initial tests have proved very
promising. However, our largest index is just shy of 30m documents and
naturally weighs in at quite a hefty size. Indexing this in ES using
the standard index storage type (mmapfs, I think) and a 10gb heap
(-Xmx10g) works fine but when querying the index I get maybe 3-4
requests/s compared to Solr's ~148 requests/s. If I then shut down the
ES server and start it with "-Des.index.store.type=memory" the number
of requests shoots up by almost a factor of 100 (~280-300 requests/s).
Next I cleared out the index and tried to re-index using an in-memory
index from the start, but doing it this way I get out of memory errors
half way through indexing (the full index size is ~20gb).

Am I doing something wrong here? How can the index work fine if I
index it and then shift to in-memory, but not directly as in-memory?

Any help appreciated,

Matt

I'm using whichever is the default gateway as I didn't make any
configuration changes there. That may explain why the speed increase
is so dramatic when re-starting with the in-memory index storage :slight_smile:

As for the fs storage, I'm using 64bit linux so I'm guessing that I'll
be on mmap as I previously mentioned.

I'll keep digging for reasons why ES is so much slower than Solr in
this instance and any recommendations for index settings (30m
documents, each ~150 bytes) would be appreciated.

On 28 July 2011 13:30, Shay Banon kimchy@gmail.com wrote:

Which gateway are you using? You can't shift to in-memory after a restart
unless you are using a shared gateway. If you are using the default local
gateway, then you will have an empty index.
By default, elasticsearch uses the nio based fs storage, not the mmap one.
Lucene has changed the default storage to be mmap if under Lucene and with
64bit, but I don't think its a good default.

Note regarding the memory store type, there have been reports of it not
working well, and causes failures when searching. I still need to chase that
one down.

On Thu, Jul 28, 2011 at 1:09 PM, Matt Wilson codebrewery@gmail.com wrote:

Hi,

I'm currently evaluating Elasticsearch as a replacement for our
existing Solr search indices and initial tests have proved very
promising. However, our largest index is just shy of 30m documents and
naturally weighs in at quite a hefty size. Indexing this in ES using
the standard index storage type (mmapfs, I think) and a 10gb heap
(-Xmx10g) works fine but when querying the index I get maybe 3-4
requests/s compared to Solr's ~148 requests/s. If I then shut down the
ES server and start it with "-Des.index.store.type=memory" the number
of requests shoots up by almost a factor of 100 (~280-300 requests/s).
Next I cleared out the index and tried to re-index using an in-memory
index from the start, but doing it this way I get out of memory errors
half way through indexing (the full index size is ~20gb).

Am I doing something wrong here? How can the index work fine if I
index it and then shift to in-memory, but not directly as in-memory?

Any help appreciated,

Matt

On Thu, Jul 28, 2011 at 4:42 PM, Matt Wilson codebrewery@gmail.com wrote:

I'm using whichever is the default gateway as I didn't make any
configuration changes there. That may explain why the speed increase
is so dramatic when re-starting with the in-memory index storage :slight_smile:

Are you validating the results that you get?

As for the fs storage, I'm using 64bit linux so I'm guessing that I'll
be on mmap as I previously mentioned.

No, you won't. As I said, the default in elasticsearch is to use the nio fs
(not mmap) even on 64bit linux, and its different then the new default in
the new Lucene 3.3 version.

I'll keep digging for reasons why ES is so much slower than Solr in
this instance and any recommendations for index settings (30m
documents, each ~150 bytes) would be appreciated.

Hard to help there without knowing the stress test you are running.

On 28 July 2011 13:30, Shay Banon kimchy@gmail.com wrote:

Which gateway are you using? You can't shift to in-memory after a restart
unless you are using a shared gateway. If you are using the default local
gateway, then you will have an empty index.
By default, elasticsearch uses the nio based fs storage, not the mmap
one.
Lucene has changed the default storage to be mmap if under Lucene and
with
64bit, but I don't think its a good default.

Note regarding the memory store type, there have been reports of it not
working well, and causes failures when searching. I still need to chase
that
one down.

On Thu, Jul 28, 2011 at 1:09 PM, Matt Wilson codebrewery@gmail.com
wrote:

Hi,

I'm currently evaluating Elasticsearch as a replacement for our
existing Solr search indices and initial tests have proved very
promising. However, our largest index is just shy of 30m documents and
naturally weighs in at quite a hefty size. Indexing this in ES using
the standard index storage type (mmapfs, I think) and a 10gb heap
(-Xmx10g) works fine but when querying the index I get maybe 3-4
requests/s compared to Solr's ~148 requests/s. If I then shut down the
ES server and start it with "-Des.index.store.type=memory" the number
of requests shoots up by almost a factor of 100 (~280-300 requests/s).
Next I cleared out the index and tried to re-index using an in-memory
index from the start, but doing it this way I get out of memory errors
half way through indexing (the full index size is ~20gb).

Am I doing something wrong here? How can the index work fine if I
index it and then shift to in-memory, but not directly as in-memory?

Any help appreciated,

Matt