Is the test result correct for query performance of saving document source is much better than not saving document source?


(Dong Aihua) #1

Hi,
I setup two clusters. each cluster have two nodes(2 shards, 1 replica).
One saves document source. Another one doesn't save. Both cluster already
saves about 0.46 billion documents count.
Through the test, I found query performance of saving document source is
much better than not saving document source, almost twice. One is about
1.5s response time, another is about 3s response time.
I'm not sure this result is correct or not.
Can anyone else help confirm it?
Thank you very much!

-Regards-
-Jackie-


(Clinton Gormley) #2

On Wed, 2012-05-30 at 00:27 -0700, jackiedong wrote:

Hi,
I setup two clusters. each cluster have two nodes(2 shards, 1
replica). One saves document source. Another one doesn't save. Both
cluster already saves about 0.46 billion documents count.
Through the test, I found query performance of saving document
source is much better than not saving document source, almost twice.
One is about 1.5s response time, another is about 3s response time.

You don't provide your queries, so it is difficult to say. In the query
on the cluster that doesn't have _source enabled, are you requesting
stored fields? If so, then factor in 5ms disk seek per field. With
_source enabled, you get your whole doc back with a single disk seek.

clint

I'm not sure this result is correct or not.
Can anyone else help confirm it?
Thank you very much!

-Regards-
-Jackie-


(Dong Aihua) #3

Hi, Clinton:
I use the default setting. That means no field is saved. Just save
the whole document or don't save the whole document.
The query is like this, plus some facets.
{
"size":10,
"query":{"query_string": {
"default_field" : "body",
"query":
"${errorbody} AND logType:${logtype} AND logTime:[2012-02-04T16:57:53 TO
2012-04-04T16:58:23]"
}}
}

在 2012年5月30日星期三UTC+8下午4时48分22秒,Clinton Gormley写道:

On Wed, 2012-05-30 at 00:27 -0700, jackiedong wrote:

Hi,
I setup two clusters. each cluster have two nodes(2 shards, 1
replica). One saves document source. Another one doesn't save. Both
cluster already saves about 0.46 billion documents count.
Through the test, I found query performance of saving document
source is much better than not saving document source, almost twice.
One is about 1.5s response time, another is about 3s response time.

You don't provide your queries, so it is difficult to say. In the query
on the cluster that doesn't have _source enabled, are you requesting
stored fields? If so, then factor in 5ms disk seek per field. With
_source enabled, you get your whole doc back with a single disk seek.

clint

I'm not sure this result is correct or not.
Can anyone else help confirm it?
Thank you very much!

-Regards-
-Jackie-


(Clinton Gormley) #4

On Wed, 2012-05-30 at 02:33 -0700, jackiedong wrote:

Hi, Clinton:
I use the default setting. That means no field is saved. Just
save the whole document or don't save the whole document.
The query is like this, plus some facets.
{
"size":10,
"query":{"query_string": {
"default_field" : "body",
"query":
"${errorbody} AND logType:${logtype} AND logTime:[2012-02-04T16:57:53
TO 2012-04-04T16:58:23]"
}}
}

So all you're getting back is the index/type/id? No fields?


(Dong Aihua) #5

Yes, just id, no fields.

在 2012年5月30日星期三UTC+8下午6时06分40秒,Clinton Gormley写道:

On Wed, 2012-05-30 at 02:33 -0700, jackiedong wrote:

Hi, Clinton:
I use the default setting. That means no field is saved. Just
save the whole document or don't save the whole document.
The query is like this, plus some facets.
{
"size":10,
"query":{"query_string": {
"default_field" : "body",
"query":
"${errorbody} AND logType:${logtype} AND logTime:[2012-02-04T16:57:53
TO 2012-04-04T16:58:23]"
}}
}

So all you're getting back is the index/type/id? No fields?


(Shay Banon) #6

Storing _source will not speed up searches compared to not storing it. As
clinton mentioned, usually the comparison is between storing _source
compared to either storing specific fields or then fetching the _source
from other datastorage, in which case, many times, storing _source will be
better.

On Thu, May 31, 2012 at 3:37 AM, jackiedong jackiedong168@gmail.com wrote:

Yes, just id, no fields.

在 2012年5月30日星期三UTC+8下午6时06分40秒,Clinton Gormley写道:

On Wed, 2012-05-30 at 02:33 -0700, jackiedong wrote:

Hi, Clinton:
I use the default setting. That means no field is saved. Just
save the whole document or don't save the whole document.
The query is like this, plus some facets.
{
"size":10,
"query":{"query_string": {
"default_field" : "body",
"query":
"${errorbody} AND logType:${logtype} AND logTime:[2012-02-04T16:57:53
TO 2012-04-04T16:58:23]"
}}
}

So all you're getting back is the index/type/id? No fields?


(system) #7