I'm currently evaluating ElasticSearch as a replacement for our
existing Solr search indices and initial tests have proved very
promising. However, our largest index is just shy of 30m documents and
naturally weighs in at quite a hefty size. Indexing this in ES using
the standard index storage type (mmapfs, I think) and a 10gb heap
(-Xmx10g) works fine but when querying the index I get maybe 3-4
requests/s compared to Solr's ~148 requests/s. If I then shut down the
ES server and start it with "-Des.index.store.type=memory" the number
of requests shoots up by almost a factor of 100 (~280-300 requests/s).
Next I cleared out the index and tried to re-index using an in-memory
index from the start, but doing it this way I get out of memory errors
half way through indexing (the full index size is ~20gb).
Am I doing something wrong here? How can the index work fine if I
index it and then shift to in-memory, but not directly as in-memory?
Which gateway are you using? You can't shift to in-memory after a restart
unless you are using a shared gateway. If you are using the default local
gateway, then you will have an empty index.
By default, elasticsearch uses the nio based fs storage, not the mmap one.
Lucene has changed the default storage to be mmap if under Lucene and with
64bit, but I don't think its a good default.
Note regarding the memory store type, there have been reports of it not
working well, and causes failures when searching. I still need to chase that
one down.
I'm currently evaluating Elasticsearch as a replacement for our
existing Solr search indices and initial tests have proved very
promising. However, our largest index is just shy of 30m documents and
naturally weighs in at quite a hefty size. Indexing this in ES using
the standard index storage type (mmapfs, I think) and a 10gb heap
(-Xmx10g) works fine but when querying the index I get maybe 3-4
requests/s compared to Solr's ~148 requests/s. If I then shut down the
ES server and start it with "-Des.index.store.type=memory" the number
of requests shoots up by almost a factor of 100 (~280-300 requests/s).
Next I cleared out the index and tried to re-index using an in-memory
index from the start, but doing it this way I get out of memory errors
half way through indexing (the full index size is ~20gb).
Am I doing something wrong here? How can the index work fine if I
index it and then shift to in-memory, but not directly as in-memory?
I'm using whichever is the default gateway as I didn't make any
configuration changes there. That may explain why the speed increase
is so dramatic when re-starting with the in-memory index storage
As for the fs storage, I'm using 64bit linux so I'm guessing that I'll
be on mmap as I previously mentioned.
I'll keep digging for reasons why ES is so much slower than Solr in
this instance and any recommendations for index settings (30m
documents, each ~150 bytes) would be appreciated.
Which gateway are you using? You can't shift to in-memory after a restart
unless you are using a shared gateway. If you are using the default local
gateway, then you will have an empty index.
By default, elasticsearch uses the nio based fs storage, not the mmap one.
Lucene has changed the default storage to be mmap if under Lucene and with
64bit, but I don't think its a good default.
Note regarding the memory store type, there have been reports of it not
working well, and causes failures when searching. I still need to chase that
one down.
I'm currently evaluating Elasticsearch as a replacement for our
existing Solr search indices and initial tests have proved very
promising. However, our largest index is just shy of 30m documents and
naturally weighs in at quite a hefty size. Indexing this in ES using
the standard index storage type (mmapfs, I think) and a 10gb heap
(-Xmx10g) works fine but when querying the index I get maybe 3-4
requests/s compared to Solr's ~148 requests/s. If I then shut down the
ES server and start it with "-Des.index.store.type=memory" the number
of requests shoots up by almost a factor of 100 (~280-300 requests/s).
Next I cleared out the index and tried to re-index using an in-memory
index from the start, but doing it this way I get out of memory errors
half way through indexing (the full index size is ~20gb).
Am I doing something wrong here? How can the index work fine if I
index it and then shift to in-memory, but not directly as in-memory?
I'm using whichever is the default gateway as I didn't make any
configuration changes there. That may explain why the speed increase
is so dramatic when re-starting with the in-memory index storage
Are you validating the results that you get?
As for the fs storage, I'm using 64bit linux so I'm guessing that I'll
be on mmap as I previously mentioned.
No, you won't. As I said, the default in elasticsearch is to use the nio fs
(not mmap) even on 64bit linux, and its different then the new default in
the new Lucene 3.3 version.
I'll keep digging for reasons why ES is so much slower than Solr in
this instance and any recommendations for index settings (30m
documents, each ~150 bytes) would be appreciated.
Hard to help there without knowing the stress test you are running.
Which gateway are you using? You can't shift to in-memory after a restart
unless you are using a shared gateway. If you are using the default local
gateway, then you will have an empty index.
By default, elasticsearch uses the nio based fs storage, not the mmap
one.
Lucene has changed the default storage to be mmap if under Lucene and
with
64bit, but I don't think its a good default.
Note regarding the memory store type, there have been reports of it not
working well, and causes failures when searching. I still need to chase
that
one down.
I'm currently evaluating Elasticsearch as a replacement for our
existing Solr search indices and initial tests have proved very
promising. However, our largest index is just shy of 30m documents and
naturally weighs in at quite a hefty size. Indexing this in ES using
the standard index storage type (mmapfs, I think) and a 10gb heap
(-Xmx10g) works fine but when querying the index I get maybe 3-4
requests/s compared to Solr's ~148 requests/s. If I then shut down the
ES server and start it with "-Des.index.store.type=memory" the number
of requests shoots up by almost a factor of 100 (~280-300 requests/s).
Next I cleared out the index and tried to re-index using an in-memory
index from the start, but doing it this way I get out of memory errors
half way through indexing (the full index size is ~20gb).
Am I doing something wrong here? How can the index work fine if I
index it and then shift to in-memory, but not directly as in-memory?
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.