[just pushed] _all field


(Shay Banon) #1

Just pushed support for _all field. It is documented here:
http://github.com/elasticsearch/elasticsearch/issues/issue/63. The all field
is basically a field that includes one or more document fields, allowing,
for example, to simply search on all the document content easily with a
queryString query. One can easily disable it, or pick and choose which
fields end up in the _all field.

But, there are no free bunnies in software, and enabled _all means more CPU
cycles when indexing, and larger index (but hey, were distributed right?
Just Add Machines(tm) ). I believe all is a very important, especially when
talking about rich documents, and not simple ones with "title" and
"content".

The decision currently is to enable _all by default, and have all the fields
included in _all by default as well. This means that the initial user
experience would be very good in terms of usability. But, in terms of
performance when indexing, it will be slower (how much slower? really
depends on the document and such). What do you think? Does this default
make sense?

-shay.banon


(egaumer) #2

On Mar 16, 5:11 pm, Shay Banon shay.ba...@elasticsearch.com wrote:

Just pushed support for _all field. It is documented here:http://github.com/elasticsearch/elasticsearch/issues/issue/63. The all field
is basically a field that includes one or more document fields, allowing,
for example, to simply search on all the document content easily with a
queryString query. One can easily disable it, or pick and choose which
fields end up in the _all field.

But, there are no free bunnies in software, and enabled _all means more CPU
cycles when indexing, and larger index (but hey, were distributed right?
Just Add Machines(tm) ). I believe all is a very important, especially when
talking about rich documents, and not simple ones with "title" and
"content".

The decision currently is to enable _all by default, and have all the fields
included in _all by default as well. This means that the initial user
experience would be very good in terms of usability. But, in terms of
performance when indexing, it will be slower (how much slower? really
depends on the document and such). What do you think? Does this default
make sense?

Hey Shay, I think the default setting here is inline with the
ElasticSearch mentality of "it just works". If you're a savvy user
then the ability to disable this feature is great but novice users
should be able to fire up an instance and get the search they'd expect
from a Google like experience.

With that said, I've been working in this space for about 6 years for
Fortune Global 500 companies (mainly with commercial search vendors
but also some Lucene work). Every commercial search vendor provides
some sort of composite field and I completely agree that this is a
must have feature in ElasticSearch. The ability to select which fields
belong to the composite is equally important but including "all" by
default seems reasonable.

I absolutely love the work you've put into ElasticSearch and once I
get some time to really investigate the architecture, I plan on
providing some help. I think too many folks are thinking about the
"big data" problem in terms of storage and forgetting about inherent
searchability. These data storage systems need an embedded search
layer similar to what's been done with TerraStore.

Awesome work and great vision.

Regards,
-Eric


(system) #3