Performance difference between _source and fields with store


(Mustafa Sener) #1

Hi,
I use ES with version 0.16.0. I tested performance using two mapping
configuration with a document having 50 string properties.

  1. _source property is disabled, _all property is disabled and specified
    store parameter of fields as yes.
  2. _source property is enabled, _all property is disabled and store
    parameter of fields as no

I don't see any significant performance difference between these two in both
bulk indexing speed and search speed (I tested search by both specifying two
fields and whole field set). Is this an expected result?

ES cluster and testing java application is on the same PC.
I tested by inserting 100000 docs divided to 10 bulk operations. And
performed search 1000 times after inserting docs.

--
Mustafa Sener
www.ifountain.com


(Shay Banon) #2

The overhead is pretty simple to understand (1 _source vs. 50 stored fields), though perf wise, it depends when you will really see it (depends on the concurrency of indexing, how much mem the filesystem cache has, and so on). Stored fields (to fetch) require two disk seeks. With _source, you fetch one field, with 50 fields, you fetch 50. When indexing, its similar notion (more file IO).
On Monday, May 2, 2011 at 3:33 PM, Mustafa Sener wrote:

Hi,
I use ES with version 0.16.0. I tested performance using two mapping configuration with a document having 50 string properties.

_source property is disabled, _all property is disabled and specified store parameter of fields as yes.
_source property is enabled, _all property is disabled and store parameter of fields as no

I don't see any significant performance difference between these two in both bulk indexing speed and search speed (I tested search by both specifying two fields and whole field set). Is this an expected result?

ES cluster and testing java application is on the same PC.
I tested by inserting 100000 docs divided to 10 bulk operations. And performed search 1000 times after inserting docs.

--
Mustafa Sener
www.ifountain.com


(Mustafa Sener) #3

Thanks

On Tue, May 3, 2011 at 7:13 PM, Shay Banon shay.banon@elasticsearch.comwrote:

The overhead is pretty simple to understand (1 _source vs. 50 stored
fields), though perf wise, it depends when you will really see it (depends
on the concurrency of indexing, how much mem the filesystem cache has, and
so on). Stored fields (to fetch) require two disk seeks. With _source, you
fetch one field, with 50 fields, you fetch 50. When indexing, its similar
notion (more file IO).

On Monday, May 2, 2011 at 3:33 PM, Mustafa Sener wrote:

Hi,
I use ES with version 0.16.0. I tested performance using two mapping
configuration with a document having 50 string properties.

  1. _source property is disabled, _all property is disabled and
    specified store parameter of fields as yes.
  2. _source property is enabled, _all property is disabled and store
    parameter of fields as no

I don't see any significant performance difference between these two in
both bulk indexing speed and search speed (I tested search by both
specifying two fields and whole field set). Is this an expected result?

ES cluster and testing java application is on the same PC.
I tested by inserting 100000 docs divided to 10 bulk operations. And
performed search 1000 times after inserting docs.

--
Mustafa Sener
www.ifountain.com

--
Mustafa Sener
www.ifountain.com
WebRep
Overall rating


(Imaravin) #4

Hi,
I don't have much experience in ES. When I heard about _source field storing entire document as JSON,
I tried to store the source as JSON objects inside Lucene Document. While retrieving back. I found the time taken for JSON construction is more than the disk read time. Still, what is the advantage in the _source field?.
We can store all the fields in the document to provide better indexing performance.