Practical uses of _source


(Otis Gospodnetić) #1

Hello,

What are some of the most practical uses of _source field?
http://www.elasticsearch.org/guide/reference/mapping/source-field.html
is a little thin on that...

I can think of the following:

  1. can be used to return the original JSON for each hit (vs. ES having
    to construct that JSON on the fly from all saved fields)
  2. can be used for updating individual fields on the server side
  3. could be used for reindexing from ES (as opposed to from original
    data source) to a new version of ES

Are there other important uses of _source field?

Thanks,
Otis

Search Analytics - http://sematext.com/search-analytics/index.html
Scalable Performance Monitoring - http://sematext.com/spm/index.html


(egaumer) #2

We've built a few file sharing type applications where users can upload
content such as PDFs, Office docs, zip files, etc. By maintaing the source,
not only can users search for content, they can also download it. This
makes it very easy to build a dropbox like application that runs behind a
corporate firewall, where everything is highly discoverable.


(Stephane Bastian) #3

Hello Otis,

We are using ElasticSearch as an Index (obviously :wink: and as some sort of
NoSql Database. We store various entities such as Orders, Customer, etc. In
this case, we only use the _source field and based upon the settings of the
config file, we decide which field of the _source is indexed and which
field is not.
On top of that, we have built some sort of hibernate mapper that maps Java
Object to json and back.
I would say that the _source field is a very important feature, that we use
(and abuse?) on a daily basis

Hope this helps,

Stephane


(David Pilato) #4

I'm also using _source to highlight results.

BTW, it can be done also with fields.

-----Message d'origine-----
De : elasticsearch@googlegroups.com
[mailto:elasticsearch@googlegroups.com] De la part de Otis Gospodnetic
Envoyé : jeudi 23 février 2012 14:31
À : elasticsearch
Objet : Practical uses of _source

Hello,

What are some of the most practical uses of _source field?
http://www.elasticsearch.org/guide/reference/mapping/source-field.html
is a little thin on that...

I can think of the following:

  1. can be used to return the original JSON for each hit (vs. ES having
    to construct that JSON on the fly from all saved fields)
  2. can be used for updating individual fields on the server side
  3. could be used for reindexing from ES (as opposed to from original
    data source) to a new version of ES

Are there other important uses of _source field?

Thanks,
Otis

Search Analytics - http://sematext.com/search-analytics/index.html
Scalable Performance Monitoring - http://sematext.com/spm/index.html


(Karel Minarik) #5

Hi,

What are some of the most practical uses of _source field?http://www.elasticsearch.org/guide/reference/mapping/source-field.html
...
I can think of the following:

the points you mention seem to approach the topic as if ES was "just a
search engine" and you needeed to get the data from some other
storage.

When you look at the _source as the document itself, ES seems more
like a database then "just a search engine". There'are already lots of
people using ES as the persistence layer, as the database. The
https://github.com/karmi/tire Ruby client, for example, has a
ActiveModel-compatible interface, making ES a "drop replacement" for
something like SQLite in a Ruby on Rails application (eg. <https://


persistent_article_with_defaults.rb>).

In my latest project, a social media monitoring tool, we use ES
exclusively as the persistence layer. (We also use Redis for other
data where we want fast access, such as user credentials, tokens,
keywords, etc). We are absolutely happy with the performance,
features and ease of use of ES as the persistence layer, via the Ruby
ActiveModel integration. (Previously, we have been using CouchDB.)

There are many more projects doing that, I suppose -- the Graylog2
project, for instance, recently switched to ES from Mongo:
http://www.lennartkoopmann.net/post/12512504316/whats-coming-graylog2-v096.

Best!,

Karel


(Shay Banon) #6

Few more notes regarding the benefits of _source:

  1. Single stored (possibly compressed) field, compared to storing several fields. This means single field to load compared to several fields (faster). For example, using _source to store 20 fields and fetching it is faster compared to storing those 20 fields on their own.
  2. _source can be used to get the actual document matching the search request, probably considerably faster compared to getting the actual data based on ids from another system.

The question is a bit misleading, since you compare _source to specific stored fields (out of the json). I would say the question breaks down to two: Do we really want to store the actual data in ES, and what are the benefits? (highlighting, faster fetching). And, do we want to store the _source on its own compared to storing specific fields.

On Friday, February 24, 2012 at 9:46 AM, Karel Minařík wrote:

Hi,

What are some of the most practical uses of _source field?http://www.elasticsearch.org/guide/reference/mapping/source-field.html
...
I can think of the following:

the points you mention seem to approach the topic as if ES was "just a
search engine" and you needeed to get the data from some other
storage.

When you look at the _source as the document itself, ES seems more
like a database then "just a search engine". There'are already lots of
people using ES as the persistence layer, as the database. The
https://github.com/karmi/tire Ruby client, for example, has a
ActiveModel-compatible interface, making ES a "drop replacement" for
something like SQLite in a Ruby on Rails application (eg. <https://
github.com/karmi/tire/blob/master/test/models/ (http://github.com/karmi/tire/blob/master/test/models/)
persistent_article_with_defaults.rb>).

In my latest project, a social media monitoring tool, we use ES
exclusively as the persistence layer. (We also use Redis for other
data where we want fast access, such as user credentials, tokens,
keywords, etc). We are absolutely happy with the performance,
features and ease of use of ES as the persistence layer, via the Ruby
ActiveModel integration. (Previously, we have been using CouchDB.)

There are many more projects doing that, I suppose -- the Graylog2
project, for instance, recently switched to ES from Mongo:
http://www.lennartkoopmann.net/post/12512504316/whats-coming-graylog2-v096.

Best!,

Karel


(system) #7