Is the test result correct for query performance of saving document source is much better than not saving document source?

Dong_Aihua · May 30, 2012, 7:27am

Hi,
I setup two clusters. each cluster have two nodes(2 shards, 1 replica).
One saves document source. Another one doesn't save. Both cluster already
saves about 0.46 billion documents count.
Through the test, I found query performance of saving document source is
much better than not saving document source, almost twice. One is about
1.5s response time, another is about 3s response time.
I'm not sure this result is correct or not.
Can anyone else help confirm it?
Thank you very much!

-Regards-
-Jackie-

Clinton_Gormley · May 30, 2012, 8:48am

On Wed, 2012-05-30 at 00:27 -0700, jackiedong wrote:

Hi,
I setup two clusters. each cluster have two nodes(2 shards, 1
replica). One saves document source. Another one doesn't save. Both
cluster already saves about 0.46 billion documents count.
Through the test, I found query performance of saving document
source is much better than not saving document source, almost twice.
One is about 1.5s response time, another is about 3s response time.

You don't provide your queries, so it is difficult to say. In the query
on the cluster that doesn't have _source enabled, are you requesting
stored fields? If so, then factor in 5ms disk seek per field. With
_source enabled, you get your whole doc back with a single disk seek.

clint

I'm not sure this result is correct or not.
Can anyone else help confirm it?
Thank you very much!

-Regards-
-Jackie-

Dong_Aihua · May 30, 2012, 9:33am

Hi, Clinton:
I use the default setting. That means no field is saved. Just save
the whole document or don't save the whole document.
The query is like this, plus some facets.
{
"size":10,
"query":{"query_string": {
"default_field" : "body",
"query":
"${errorbody} AND logType:${logtype} AND logTime:[2012-02-04T16:57:53 TO
2012-04-04T16:58:23]"
}}
}

在 2012年5月30日星期三UTC+8下午4时48分22秒，Clinton Gormley写道：

On Wed, 2012-05-30 at 00:27 -0700, jackiedong wrote:

Hi,
I setup two clusters. each cluster have two nodes(2 shards, 1
replica). One saves document source. Another one doesn't save. Both
cluster already saves about 0.46 billion documents count.
Through the test, I found query performance of saving document
source is much better than not saving document source, almost twice.
One is about 1.5s response time, another is about 3s response time.

You don't provide your queries, so it is difficult to say. In the query
on the cluster that doesn't have _source enabled, are you requesting
stored fields? If so, then factor in 5ms disk seek per field. With
_source enabled, you get your whole doc back with a single disk seek.

clint

I'm not sure this result is correct or not.
Can anyone else help confirm it?
Thank you very much!

-Regards-
-Jackie-

Clinton_Gormley · May 30, 2012, 10:06am

On Wed, 2012-05-30 at 02:33 -0700, jackiedong wrote:

Hi, Clinton:
I use the default setting. That means no field is saved. Just
save the whole document or don't save the whole document.
The query is like this, plus some facets.
{
"size":10,
"query":{"query_string": {
"default_field" : "body",
"query":
"${errorbody} AND logType:${logtype} AND logTime:[2012-02-04T16:57:53
TO 2012-04-04T16:58:23]"
}}
}

So all you're getting back is the index/type/id? No fields?

Dong_Aihua · May 31, 2012, 1:37am

Yes, just id, no fields.

在 2012年5月30日星期三UTC+8下午6时06分40秒，Clinton Gormley写道：

On Wed, 2012-05-30 at 02:33 -0700, jackiedong wrote:

Hi, Clinton:
I use the default setting. That means no field is saved. Just
save the whole document or don't save the whole document.
The query is like this, plus some facets.
{
"size":10,
"query":{"query_string": {
"default_field" : "body",
"query":
"${errorbody} AND logType:${logtype} AND logTime:[2012-02-04T16:57:53
TO 2012-04-04T16:58:23]"
}}
}

So all you're getting back is the index/type/id? No fields?

kimchy · June 3, 2012, 9:19am

Storing _source will not speed up searches compared to not storing it. As
clinton mentioned, usually the comparison is between storing _source
compared to either storing specific fields or then fetching the _source
from other datastorage, in which case, many times, storing _source will be
better.

On Thu, May 31, 2012 at 3:37 AM, jackiedong jackiedong168@gmail.com wrote:

Yes, just id, no fields.

在 2012年5月30日星期三UTC+8下午6时06分40秒，Clinton Gormley写道：

On Wed, 2012-05-30 at 02:33 -0700, jackiedong wrote:

Hi, Clinton:
I use the default setting. That means no field is saved. Just
save the whole document or don't save the whole document.
The query is like this, plus some facets.
{
"size":10,
"query":{"query_string": {
"default_field" : "body",
"query":
"${errorbody} AND logType:${logtype} AND logTime:[2012-02-04T16:57:53
TO 2012-04-04T16:58:23]"
}}
}

So all you're getting back is the index/type/id? No fields?

Topic		Replies	Views
Possible optimisations for large _source documents Elasticsearch	7	595	July 5, 2017
Performance issues around _source and large page size Elasticsearch	5	1001	July 5, 2017
Performance impact due to _source storage Elasticsearch	4	1150	July 6, 2017
Elastic Search Query performance when source is disabled Elasticsearch	12	657	November 25, 2022
Dec 1st, 2021: [en] The impact of Elasticsearch source filtering on performance Advent Calendar	1	1613	December 29, 2021

Is the test result correct for query performance of saving document source is much better than not saving document source?

Related topics