Solr vs ES: performance


(mfeingold) #1

In my quest for a search platform for our internal needs I am playing
with both Solr and ES. I indexed the same data ~4.5M documents with
the same structure of indexes and run the same query on both.

What surprises me is that for whatever reason ES search is 1.5 times
slower than Solr. Also the Solr index size is 650MB vs 2.1G for ES.

Can you help me figure out what am I missing here?

ES configuration: In my quest for a search platform for our internal
needs I am playing with both Solr and ES. I indexed the same data
~4.5M documents with the same structure of indexes and run the same
query on both.

What surprises me is that for whatever reason ES search is more than 3
times slower than Solr. Also the Solr index size is 650MB vs 2.1G for
ES.

Can you help me figure out what am I missing here?

ES configuration: https://gist.github.com/1118703

Solr Schema: https://gist.github.com/1118711

Query: https://gist.github.com/1118864


(Paul Loy) #2

One immediate difference would be that the default number of shards in ES is
5 and the default number of replicas is 1 (i.e. a master and one copy).

The replication factor will mean 2x the storage. Also, I think you get a
local gateway out of the box, so that gives you another copy of all the
shards and replicas making for 4x the actual index size. 2.1Gb is pretty
much 4x 650Mb and so is expected.

Given 5 shards, a query will take longer as it has to map/reduce across the
5 shards.

So for a single-node, single-shard, like-for-like test you should set shards
to 1 and replicas to 0. Then they'll be comparable. But then, of course, you
have negated the reason why you would choose ES in the first place which is
to increase write-throughput and to make your index scalable and much more
available. :slight_smile:

Cheers,

On Mon, Aug 1, 2011 at 9:05 PM, Michael Feingold mfeingold@hill30.comwrote:

In my quest for a search platform for our internal needs I am playing
with both Solr and ES. I indexed the same data ~4.5M documents with
the same structure of indexes and run the same query on both.

What surprises me is that for whatever reason ES search is 1.5 times
slower than Solr. Also the Solr index size is 650MB vs 2.1G for ES.

Can you help me figure out what am I missing here?

ES configuration: In my quest for a search platform for our internal
needs I am playing with both Solr and ES. I indexed the same data
~4.5M documents with the same structure of indexes and run the same
query on both.

What surprises me is that for whatever reason ES search is more than 3
times slower than Solr. Also the Solr index size is 650MB vs 2.1G for
ES.

Can you help me figure out what am I missing here?

ES configuration: https://gist.github.com/1118703

Solr Schema: https://gist.github.com/1118711

Query: https://gist.github.com/1118864

--

Paul Loy
paul@keteracel.com
http://uk.linkedin.com/in/paulloy


(Shay Banon) #3

Few notes:

  • elasticsearch stores the json document itself in the index, which can
    explain the size difference. Also, I am not sure what solr configuration is
    when it comes to merging.

  • By default, elasticsearch creates an _all field that aggregates all the
    fields as a single searchable field. This might explain the size difference
    (and possibly slower indexing rate). You can easily disable it:
    http://www.elasticsearch.org/guide/reference/mapping/all-field.html.

  • By default, when you create an index in elasticsearch, its already has 5
    shards, so a search executes across those 5 shards (which might also explain
    why the index size is bigger). If you wan to compare it to a single Solr
    instance, then either create an index with a single shard in elasticsearch,
    or start a 5 core solr server and do distributed search across them (this is
    where a you will see a big difference, I suspect, as solr does distributed
    search not as well as elasticsearch).

  • Checking a single same query perf is problematic. elasticsearch, by
    design, does not have caches that solr has where they heavily come into play
    with same query perf test. The reason for that is that the overhead of those
    caches (doc cache, query cache) when it comes to actual varied usage is
    problematic when it comes to garbage collection in the JVM and concurrency.

On Mon, Aug 1, 2011 at 11:05 PM, Michael Feingold mfeingold@hill30.comwrote:

In my quest for a search platform for our internal needs I am playing
with both Solr and ES. I indexed the same data ~4.5M documents with
the same structure of indexes and run the same query on both.

What surprises me is that for whatever reason ES search is 1.5 times
slower than Solr. Also the Solr index size is 650MB vs 2.1G for ES.

Can you help me figure out what am I missing here?

ES configuration: In my quest for a search platform for our internal
needs I am playing with both Solr and ES. I indexed the same data
~4.5M documents with the same structure of indexes and run the same
query on both.

What surprises me is that for whatever reason ES search is more than 3
times slower than Solr. Also the Solr index size is 650MB vs 2.1G for
ES.

Can you help me figure out what am I missing here?

ES configuration: https://gist.github.com/1118703

Solr Schema: https://gist.github.com/1118711

Query: https://gist.github.com/1118864


(Shay Banon) #4

Ha!, Paul already answered some of it :), lemme just correct some things
Paul :slight_smile:

On Mon, Aug 1, 2011 at 11:33 PM, Paul Loy keteracel@gmail.com wrote:

One immediate difference would be that the default number of shards in ES
is 5 and the default number of replicas is 1 (i.e. a master and one copy).

The replication factor will mean 2x the storage.

Thats only affects things if you have more than 1 node, as elasticsearch
won't allocate a shard and a replica on the same node.

Also, I think you get a local gateway out of the box, so that gives you
another copy of all the shards and replicas making for 4x the actual index
size. 2.1Gb is pretty much 4x 650Mb and so is expected.

The local gateway does not create another copy of the data, it uses the
same data directory that your indices reside on.

Given 5 shards, a query will take longer as it has to map/reduce across the
5 shards.

So for a single-node, single-shard, like-for-like test you should set
shards to 1 and replicas to 0. Then they'll be comparable. But then, of
course, you have negated the reason why you would choose ES in the first
place which is to increase write-throughput and to make your index scalable
and much more available. :slight_smile:

Cheers,

On Mon, Aug 1, 2011 at 9:05 PM, Michael Feingold mfeingold@hill30.comwrote:

In my quest for a search platform for our internal needs I am playing
with both Solr and ES. I indexed the same data ~4.5M documents with
the same structure of indexes and run the same query on both.

What surprises me is that for whatever reason ES search is 1.5 times
slower than Solr. Also the Solr index size is 650MB vs 2.1G for ES.

Can you help me figure out what am I missing here?

ES configuration: In my quest for a search platform for our internal
needs I am playing with both Solr and ES. I indexed the same data
~4.5M documents with the same structure of indexes and run the same
query on both.

What surprises me is that for whatever reason ES search is more than 3
times slower than Solr. Also the Solr index size is 650MB vs 2.1G for
ES.

Can you help me figure out what am I missing here?

ES configuration: https://gist.github.com/1118703

Solr Schema: https://gist.github.com/1118711

Query: https://gist.github.com/1118864

--

Paul Loy
paul@keteracel.com
http://uk.linkedin.com/in/paulloy


(system) #5