inlined
On Monday, March 21, 2011 at 4:51 AM, Antonio Lobato wrote:
Hey all. I just wanted to share my experiences with elasticsearch on
a 10 node cluster and my thoughts. I've had a lot of fun using it,
and will make heavy use of it in the future,
http://blog.jinked.com/posts/30
I wanted to take the time to address two things really quickly
though. To answer Shay's question, the trivial lacking areas in
elasticsearch where Solr has functionality that elasticsearch doesn't,
are things such as the contrib/data import handler. We actually do use
it in one case, but only because of a limitation in Solr that doesn't
exist in elasticsearch (NRT, to be exact). Moving over to
elasticsearch from Solr would eliminate our need for the data import
handler, as we can distribute/index this particular dataset in real
time.
Not too familiar with the data import handler, but, if you are missing the option to import data from a database, and index it, then yes, elasticsearch does not do it automatically.
This is for a simple reason, I am not familiar with a nice way of configuring a simple import of data from a database that also makes sure to index changes done on the database. Also, because of the relational model, the data you want to index can come in different flavors (joins and what not), and then how does that relate to realtime change streaming through the database.
Sure, the simple case of indexing a single table and monitor changes based on timestamp (but, then, how do you handle deletes?) can be implemented. But thats the exception, not the rule, specially with the relational model.
Never got the fascination of some developers with writing complex configuration trying to tell a software what to do, compared to writing code that does that in half the time and in much more readable format (code, and its your code). Writing a script / code that does exactly what you want, with your (relational) data, and controlling how to index it is much simpler then trying to understand how a complex mapping tool works and how to configure it.
Thats my thinking. Of course, this does not mean that in the future, you won't be able to write code that express how to index a database and have it run as a river.
The second thing I wanted to bring up was a bug. I've seen this
happen before with certain software (including software I've written
my self), so it's not too uncommon. elasticsearch will listen on
0.0.0.0 by default, but if the machine has two IP addresses, it will
actually listen on one IP address and send requests/multicast out on
the other. The result is elasticsearch will not become a member of
the cluster and will isolate it self on an island on that one
machine. Of course, one work around is to define the IP to bind to in
the config, but it'd certainly go a long way in ease of use if we can
just setup elasticsearch and go. In particular since we share a single
config file on a cluster file system.
Heya, yea, thats how the multicast work, but, you can also use unicast discovery if you want. Note, I tried to address that using the special tokens in the host configurations, like en0: Elasticsearch Platform — Find real-time answers at scale | Elastic.
Flippin' amazing software though. elasticsearch's cluster model is
how all multi-node software should function.
Thanks!.
Cheers!
-Antonio