We've built a few file sharing type applications where users can upload
content such as PDFs, Office docs, zip files, etc. By maintaing the source,
not only can users search for content, they can also download it. This
makes it very easy to build a dropbox like application that runs behind a
corporate firewall, where everything is highly discoverable.
We are using ElasticSearch as an Index (obviously and as some sort of
NoSql Database. We store various entities such as Orders, Customer, etc. In
this case, we only use the _source field and based upon the settings of the
config file, we decide which field of the _source is indexed and which
field is not.
On top of that, we have built some sort of hibernate mapper that maps Java
Object to json and back.
I would say that the _source field is a very important feature, that we use
(and abuse?) on a daily basis
the points you mention seem to approach the topic as if ES was "just a
search engine" and you needeed to get the data from some other
storage.
When you look at the _source as the document itself, ES seems more
like a database then "just a search engine". There'are already lots of
people using ES as the persistence layer, as the database. The https://github.com/karmi/tire Ruby client, for example, has a
ActiveModel-compatible interface, making ES a "drop replacement" for
something like SQLite in a Ruby on Rails application (eg. <https://
persistent_article_with_defaults.rb>).
In my latest project, a social media monitoring tool, we use ES
exclusively as the persistence layer. (We also use Redis for other
data where we want fast access, such as user credentials, tokens,
keywords, etc). We are absolutely happy with the performance,
features and ease of use of ES as the persistence layer, via the Ruby
ActiveModel integration. (Previously, we have been using CouchDB.)
Single stored (possibly compressed) field, compared to storing several fields. This means single field to load compared to several fields (faster). For example, using _source to store 20 fields and fetching it is faster compared to storing those 20 fields on their own.
_source can be used to get the actual document matching the search request, probably considerably faster compared to getting the actual data based on ids from another system.
The question is a bit misleading, since you compare _source to specific stored fields (out of the json). I would say the question breaks down to two: Do we really want to store the actual data in ES, and what are the benefits? (highlighting, faster fetching). And, do we want to store the _source on its own compared to storing specific fields.
On Friday, February 24, 2012 at 9:46 AM, Karel Minařík wrote:
the points you mention seem to approach the topic as if ES was "just a
search engine" and you needeed to get the data from some other
storage.
When you look at the _source as the document itself, ES seems more
like a database then "just a search engine". There'are already lots of
people using ES as the persistence layer, as the database. The https://github.com/karmi/tire Ruby client, for example, has a
ActiveModel-compatible interface, making ES a "drop replacement" for
something like SQLite in a Ruby on Rails application (eg. <https:// github.com/karmi/tire/blob/master/test/models/ (http://github.com/karmi/tire/blob/master/test/models/)
persistent_article_with_defaults.rb>).
In my latest project, a social media monitoring tool, we use ES
exclusively as the persistence layer. (We also use Redis for other
data where we want fast access, such as user credentials, tokens,
keywords, etc). We are absolutely happy with the performance,
features and ease of use of ES as the persistence layer, via the Ruby
ActiveModel integration. (Previously, we have been using CouchDB.)
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.