I never used Apache Solr before, and I'm trying ElasticSearch in my project.
The document of ES is a little scarce, but I have to explain to my
supervisor why I chose ES over Solr.
As far as I know, Solr (with Solr Cloud) also supports distributed
indexing, near real-time update and searching, and automatic load
balancing,
which are the main features of ElasticSearch.
What are the advantages of ES comparing to Apache Solr? Could anybody give
me a tip, or some information links?
Thanks a lot.
I never used Apache Solr before, and I'm trying ElasticSearch in my project.
The document of ES is a little scarce, but I have to explain to my supervisor why I chose ES over Solr.
As far as I know, Solr (with Solr Cloud) also supports distributed indexing, near real-time update and searching, and automatic load balancing,
which are the main features of ElasticSearch.
What are the advantages of ES comparing to Apache Solr? Could anybody give me a tip, or some information links?
Thanks a lot.
About six months ago I spent a week porting a prototype from Solr Cloud to
Elasticsearch with the intent of evaluating Elasticsearch and either
throwing out the port or building off of it. By the third day or so I was
convinced I'd stick with Elasticsearch because:
There is some http GET that you can hit in solr that will delete the
index (or a shard or something). That shook my faith in humanity a little.
Especially when I pasted it into IRC and my coworker clicked it or mouse
overed it or something.... Gets. Idempotent.
I liked the phrase suggester.
My ops team seemed like it better.
There was (and still is) a deb package.
I liked the way Elasticsearch was tested. I admit I haven't actually
looked into how Solr is tested.
Since then:
I've enjoyed the process of landing changes in Elasticsearch much more
then Lucene. I assume Solr would be the same because it is in the same
repository as Lucene, The github process (pull request, etc) is better
than JIRA/svn/patch files. I also think the Elasticsearch
committers/repository collaborators are easier to work with then the Lucene
folks.
The phrase suggester needed some work to be as good as our
(surprisingly advanced) home grown suggester. It is now that good.
Elasticsearch has really improved the process of maintaining their
documentation so I imagine it'll only get better.
I never used Apache Solr before, and I'm trying Elasticsearch in my
project.
The document of ES is a little scarce, but I have to explain to my
supervisor why I chose ES over Solr.
As far as I know, Solr (with Solr Cloud) also supports distributed
indexing, near real-time update and searching, and automatic load
balancing,
which are the main features of Elasticsearch.
What are the advantages of ES comparing to Apache Solr? Could anybody give
me a tip, or some information links?
Thanks a lot.
On Tuesday, December 24, 2013 9:16:55 AM UTC-5, Daniel Guo wrote:
I never used Apache Solr before, and I'm trying Elasticsearch in my
project.
The document of ES is a little scarce, but I have to explain to my
supervisor why I chose ES over Solr.
As far as I know, Solr (with Solr Cloud) also supports distributed
indexing, near real-time update and searching, and automatic load
balancing,
which are the main features of Elasticsearch.
What are the advantages of ES comparing to Apache Solr? Could anybody give
me a tip, or some information links?
Thanks a lot.
Hi Nik:
Thanks for sharing your experience and opinion on the topic.
Could you please give me some advice on a bigger picture, such as the
distributed model, read-time indexing, search performance and so on.
Thanks so much.
On Tuesday, December 24, 2013 11:17:18 PM UTC+8, Nikolas Everett wrote:
About six months ago I spent a week porting a prototype from Solr Cloud to
Elasticsearch with the intent of evaluating Elasticsearch and either
throwing out the port or building off of it. By the third day or so I was
convinced I'd stick with Elasticsearch because:
There is some http GET that you can hit in solr that will delete the
index (or a shard or something). That shook my faith in humanity a little.
Especially when I pasted it into IRC and my coworker clicked it or mouse
overed it or something.... Gets. Idempotent.
I liked the phrase suggester.
My ops team seemed like it better.
There was (and still is) a deb package.
I liked the way Elasticsearch was tested. I admit I haven't actually
looked into how Solr is tested.
Since then:
I've enjoyed the process of landing changes in Elasticsearch much more
then Lucene. I assume Solr would be the same because it is in the same
repository as Lucene, The github process (pull request, etc) is better
than JIRA/svn/patch files. I also think the Elasticsearch
committers/repository collaborators are easier to work with then the Lucene
folks.
The phrase suggester needed some work to be as good as our
(surprisingly advanced) home grown suggester. It is now that good.
Elasticsearch has really improved the process of maintaining their
documentation so I imagine it'll only get better.
Le 24 décembre 2013 at 15:16:59, Daniel Guo (danie...@gmail.com<javascript:>)
a écrit:
I never used Apache Solr before, and I'm trying Elasticsearch in my
project.
The document of ES is a little scarce, but I have to explain to my
supervisor why I chose ES over Solr.
As far as I know, Solr (with Solr Cloud) also supports distributed
indexing, near real-time update and searching, and automatic load
balancing,
which are the main features of Elasticsearch.
What are the advantages of ES comparing to Apache Solr? Could anybody
give me a tip, or some information links?
Thanks a lot.
David, nice to see you. Your opinion is very helpful to me, even though it
may be biased.
When I have more time, I'll accept your advice and try both of them by
myself.
On Tuesday, December 24, 2013 10:45:09 PM UTC+8, David Pilato wrote:
I would say: play with both for some hours.
I really think you will get some answers by yourself!
I don't want to say more than this as I have probably a biased opinion
Le 24 décembre 2013 at 15:16:59, Daniel Guo (danie...@gmail.com<javascript:>)
a écrit:
I never used Apache Solr before, and I'm trying Elasticsearch in my
project.
The document of ES is a little scarce, but I have to explain to my
supervisor why I chose ES over Solr.
As far as I know, Solr (with Solr Cloud) also supports distributed
indexing, near real-time update and searching, and automatic load
balancing,
which are the main features of Elasticsearch.
What are the advantages of ES comparing to Apache Solr? Could anybody give
me a tip, or some information links?
Thanks a lot.
On Tuesday, December 24, 2013 9:16:55 AM UTC-5, Daniel Guo wrote:
I never used Apache Solr before, and I'm trying Elasticsearch in my
project.
The document of ES is a little scarce, but I have to explain to my
supervisor why I chose ES over Solr.
As far as I know, Solr (with Solr Cloud) also supports distributed
indexing, near real-time update and searching, and automatic load
balancing,
which are the main features of Elasticsearch.
What are the advantages of ES comparing to Apache Solr? Could anybody
give me a tip, or some information links?
Thanks a lot.
Automatic shard rebalancing works quite well. We're able to do rolling
restarts without losing any redundancy. It is useful to keep in mind that
some things, like scores and suggestions, come from data that is per shard
rather across the whole index.
read-time indexing
I assume you mean real time indexing. That works fine. Our problem is
actually getting the documents built and shipped of to Elasticsearch in a
timely manner, not Elasticsearch being able to ingest them. It is
important to make sure that you have a process for doing on line schema
changes like Elasticsearch Platform — Find real-time answers at scale | Elastic .
Those processes can push Elasticsearch to its limit if you do them
multi-threaded/multi-process (shakes fist at PHP). Just don't use so many
threads that you crush Elasticsearch. You'll have to measure that. We
crushed three Elasticsearch nodes with 20 processes but your mileage will
vary.
As I said before I like the Elasticsearch community. They are helpful.
Make sure to wait a week to ten days after each release to see if some
critical flaw is discovered. Elasticsearch is pretty well tested but every
other release seems to have had some trouble recently. I doubt this'll
happen every time but you may as well be safe.
For my use case automatic index creation and automatic field creation more
trouble then helpful. These may be worth turning off for you. They are on
by default because they work well for some significant portion of users and
they make playing around really easy.
Hi, Nik:
Thank you for your practical experience sharing. I''ll remember and follow
your advice. Thanks again!
On Saturday, December 28, 2013 3:57:38 AM UTC+8, Nikolas Everett wrote:
On Fri, Dec 27, 2013 at 4:30 AM, Daniel Guo <danie...@gmail.com<javascript:>
wrote:
distributed model
Automatic shard rebalancing works quite well. We're able to do rolling
restarts without losing any redundancy. It is useful to keep in mind that
some things, like scores and suggestions, come from data that is per shard
rather across the whole index.
read-time indexing
I assume you mean real time indexing. That works fine. Our problem is
actually getting the documents built and shipped of to Elasticsearch in a
timely manner, not Elasticsearch being able to ingest them. It is
important to make sure that you have a process for doing on line schema
changes like Elasticsearch Platform — Find real-time answers at scale | Elastic .
Those processes can push Elasticsearch to its limit if you do them
multi-threaded/multi-process (shakes fist at PHP). Just don't use so many
threads that you crush Elasticsearch. You'll have to measure that. We
crushed three Elasticsearch nodes with 20 processes but your mileage will
vary.
search performance
So far everything is quite quick and we're happy that we can add more
replicas to increase performance. We're not sure yet if we'll do that. I
suggest setting up whatever kind of performance metrics gathering system
you have in house. Capturing those metrics is pretty simple as you can
just dig them out of the rest api. If you happen to use ganglia feel free
to use our script: ganglia · operations-puppet
and so on.
As I said before I like the Elasticsearch community. They are helpful.
Make sure to wait a week to ten days after each release to see if some
critical flaw is discovered. Elasticsearch is pretty well tested but every
other release seems to have had some trouble recently. I doubt this'll
happen every time but you may as well be safe.
For my use case automatic index creation and automatic field creation more
trouble then helpful. These may be worth turning off for you. They are on
by default because they work well for some significant portion of users and
they make playing around really easy.
The search for www.chegg.com is powered by SOLR, because that's done by the
search team, who are are more hard-core search nerds, like XML instead of
JSON, etc. They have one master and a whole bunch of slaves, and rebuild
the master continuously.
I wanted to switch to Elasticsearch for the eReader team, because we're
constantly adding new eBooks to our catalog, so I needed something that
clustered. We had a bunch of endless meetings discussing it. Ops wanted a
zone-aware solution, which Solr Cloud, since its based on Zookeeper,
couldn't do automatically. Plus realistically, only the search folks knew
how to deal with Solr. I could deal with ES with just my team with partial
attention.
Elasticsearch could do the zone aware thing, so that's how I got Ops to
sign up. Plus they were already using Logstash. But really, its because its
much easier for me to administrate, and the clustering part just works on
its own without needing zookeeper.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.