From: Shay Banon
Sent: Monday, April 18, 2011 12:09 PM
Subject: Re: elasticsearch vs solr : indexing speed
Here is clinton answer: https://gist.github.com/0382ed3913f0c3e40d62, and I'd like to add to that:
- In order to completely compare the two in terms of overhead when indexing, at least for this very simple doc, the _source and _all field needs to be disabled.
- The type used for Solr field1 is, when used in ES, of index set to not_analyzed, and omit_norms set to true. It should be the same for ES.
- Again, ES will index two more additional fields, _id and _type. To really compare, they should be set to index to no. When doing so, the only thing one looses is the ability to query them on search time (this is in master).
I posted a sample as a comment on clinton post.
Some more aspects to how ES works differently than Solr:
- When indexing data its there. If you "kill -9" ES (even with a single server), and start it back up, all data indexing up until that point will be there with local gateway (this is not done through committing Lucene on each change, as this will not scale). Solr, on the other hand, will loose all changes until the last commit. This does come with a (small) overhead.
- The bulk API format for elasticsearch is more optimized for distributed execution, where it needs to be sliced and diced in order to point the bulk items to the correct shards. This does come with an overhead compared to a single big json that is parsed and processed in a single shard scenario, while proves very crucial when working with several shards.
On Monday, April 18, 2011 at 5:56 AM, Otis wrote:
I wouldn't pay much attention to that post/benchmark. A good
benchmark needs to publish a lot more details than the above, starting
with basic stuff like -Xmx. I'm also of the opinion that if you are
going to publish a benchmark comparing 2 pieces of software then you
better invite experts from both sides and let them tune and optimize
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/
On Apr 16, 9:56 pm, massi mehdi.a...@gmail.com wrote:
What do you think of this article:http://dmurphy747.wordpress.com/2011/04/02/solr-vs-elasticsearch-deat...
where elasticsearch and solr are compared with regard to the indexing
A quote from the article: "I ran each test 4 times, killing the JVM
and removing the data directory for both Solr and elasticsearch. The
final averaged results expressed as throughputs were 43204 docs/sec
for Solr, 44052 docs/sec for Solr direct streaming, and 9823 docs/sec
PS: Don't take me wrong, I know that it is only one (partial) test,
and that some features in elasticsearch make it unique!