Strange behaviour of Elasticsearch by sorting


(J. Schulz) #1

Hello together,

I have a really strange behaviour and need your wisdom.

At first some information about my setup:

2 Nodes
60 Shards
15 Indices
300 millions documents
125gb size
1 replica

After a fresh start of Elasticsearch the following request is very fast:

curl "http://localhost:9200/_all/event/_search?pretty=true"http://localhost:9200/_all/event/_search?pretty=true-d '{
"filter" : {
"term" : { "usert_id" : "510" }
},
"size" : "0"
}'

{
"took" : 105,
"timed_out" : false,
"_shards" : {
"total" : 60,
"successful" : 60,
"failed" : 0
},
"hits" : {
"total" : 24,
"max_score" : 1.0,
"hits" : [ ]
}
}

As you can see it took 105 milliseconds for 24 hits and that is great.

If I execute the same request and add a sort to it then the response time
is growing absurdly:

curl "http://localhost:9200/_all/event/_search?pretty=true"http://localhost:9200/_all/event/_search?pretty=true-d '{
"filter" : {
"term" : { "usert_id" : "510" }
},
"sort" : {
"time" : { "order" : "desc" }
},
"size" : "0"
}'

{
"took" : 129251,
"timed_out" : false,
"_shards" : {
"total" : 60,
"successful" : 60,
"failed" : 0
},
"hits" : {
"total" : 24,
"max_score" : null,
"hits" : [ ]
}
}

Do you have an idea what happends there? After the second execution the
sort-request is fast too, but if I excute the request after some minutes it
is slow again.

Cheers
Jonny

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/63d3e4e0-46d1-418e-a33b-77a0d65d55a1%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(David Pilato) #2

I would say that sort needs to load some data in memory.
Do you have enough HEAP?

If so, is the sort faster when you call the same query again?

--
David :wink:
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Le 5 déc. 2013 à 21:23, "J. Schulz" js.bloonix@gmail.com a écrit :

Hello together,

I have a really strange behaviour and need your wisdom.

At first some information about my setup:

2 Nodes
60 Shards
15 Indices
300 millions documents
125gb size
1 replica

After a fresh start of Elasticsearch the following request is very fast:

curl "http://localhost:9200/_all/event/_search?pretty=true" -d '{
"filter" : {
"term" : { "usert_id" : "510" }
},
"size" : "0"
}'

{
"took" : 105,
"timed_out" : false,
"_shards" : {
"total" : 60,
"successful" : 60,
"failed" : 0
},
"hits" : {
"total" : 24,
"max_score" : 1.0,
"hits" : [ ]
}
}

As you can see it took 105 milliseconds for 24 hits and that is great.

If I execute the same request and add a sort to it then the response time is growing absurdly:

curl "http://localhost:9200/_all/event/_search?pretty=true" -d '{
"filter" : {
"term" : { "usert_id" : "510" }
},
"sort" : {
"time" : { "order" : "desc" }
},
"size" : "0"
}'

{
"took" : 129251,
"timed_out" : false,
"_shards" : {
"total" : 60,
"successful" : 60,
"failed" : 0
},
"hits" : {
"total" : 24,
"max_score" : null,
"hits" : [ ]
}
}

Do you have an idea what happends there? After the second execution the sort-request is fast too, but if I excute the request after some minutes it is slow again.

Cheers
Jonny

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/63d3e4e0-46d1-418e-a33b-77a0d65d55a1%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/A5B9D5E3-7AFE-4966-9E3A-1B57C5EB9A01%40pilato.fr.
For more options, visit https://groups.google.com/groups/opt_out.


(J. Schulz) #3

16gb memory for ES. The disks are SSDs.

The second or third query is very fast, but after some minutes it falls
back and is slow again.

I am wondering because there are only 24 hits.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/5bb32875-9fd5-4e1d-9b9f-8fd462fb60e3%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(David Pilato) #4

I think you will have about the same response time with a match all query.

--
David :wink:
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Le 5 déc. 2013 à 21:52, "J. Schulz" js.bloonix@gmail.com a écrit :

16gb memory for ES. The disks are SSDs.

The second or third query is very fast, but after some minutes it falls back and is slow again.

I am wondering because there are only 24 hits.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/5bb32875-9fd5-4e1d-9b9f-8fd462fb60e3%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/F6331505-56ED-4E21-8722-F10A4A52FDCE%40pilato.fr.
For more options, visit https://groups.google.com/groups/opt_out.


(Alexander Reelsen) #5

Hey,

can you try a filtered query instead of the top level filter (using a
match_all query for the query part), where the filter is part of that query
and say if the run time for that query is the same? See
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-filtered-query.html

The effect you are seeing after 5 minutes, is the expiration of the
fielddata strcuture (which is needed for sorting), see the
indices.fielddata.cache.expire setting in
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/index-modules-fielddata.html

--Alex

On Thu, Dec 5, 2013 at 10:06 PM, David Pilato david@pilato.fr wrote:

I think you will have about the same response time with a match all query.

--
David :wink:
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Le 5 déc. 2013 à 21:52, "J. Schulz" js.bloonix@gmail.com a écrit :

16gb memory for ES. The disks are SSDs.

The second or third query is very fast, but after some minutes it falls
back and is slow again.

I am wondering because there are only 24 hits.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/5bb32875-9fd5-4e1d-9b9f-8fd462fb60e3%40googlegroups.com
.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/F6331505-56ED-4E21-8722-F10A4A52FDCE%40pilato.fr
.

For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAGCwEM9hLeWDXZCyNQ_jxHEyUG34gV7Cx-998UqdWvVgTuGcdQ%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.


(J. Schulz) #6

Hi Alexander,

the result is near the same.

curl "http://localhost:9200/_all/event/_search?routing=510&pretty=true" -d
'{
"filter" : {
"term" : { "host_id" : "510" }
},
"size" : "0"
}'

{
"took" : 169,
"timed_out" : false,
"_shards" : {
"total" : 15,
"successful" : 15,
"failed" : 0
},
"hits" : {
"total" : 32,
"max_score" : 1.0,
"hits" : [ ]
}
}

curl "http://localhost:9200/_all/event/_search?routing=510&pretty=true" -d
'{
"query" : {
"filtered" : {
"query" : {
"term" : { "host_id" : "510" }
},
"filter" : {
"range" : {
"time" : {
"from" : 1385717514000,
"to" : 1386322314000
}
}
}
}
},
"sort" : {
"time" : { "order" : "desc" }
},
"size" : "0"
}'

{
"took" : 65012,
"timed_out" : false,
"_shards" : {
"total" : 15,
"successful" : 15,
"failed" : 0
},
"hits" : {
"total" : 9,
"max_score" : null,
"hits" : [ ]
}
}

But I have another really strange iissue.

curl
"http://localhost:9200/data-2013-11/event/_search?routing=510&pretty=true"
-d '{
"filter" : {
"term" : { "host_id" : "510" }
},
"size" : "100"
}'

{
"took" : 34,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"failed" : 0
},
"hits" : {
"total" : 25,
"max_score" : 1.0,
"hits" : [ {
...
...

curl
"http://localhost:9200/data-2013-11/event/_search?routing=510&pretty=true"
-d '{
"filter" : {
"term" : { "host_id" : "510" }
},
"sort" : {
"time" : { "order" : "desc" }
},
"size" : "100"
}'

{
"took" : 53191,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"failed" : 0
},
"hits" : {
"total" : 25,
"max_score" : null,
"hits" : [ {
...
...

As you can see there are only 25 hits.

The first statement runs 24 milliseconds with a size of 100! I got all 25
documents immediately!

The second request takes 53 seconds! Does ES needs more than 50 seconds to
sort the data?

What happends there internally? I would expect that ES just takes the
result of the filter and sort it, but it seems that ES does something
really strange that makes no sense.

Cheers
Jonny

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/b468058c-1721-46c6-ac53-5a018a62833e%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(J. Schulz) #7

Hi again,

from the docs (
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/index-modules-fielddata.html
):

"The field data cache is used mainly when sorting on or faceting on a
field. It loads all the field values to memory in order to provide fast
document based access to those values."

What does that exact mean?

As example if I have an index with 10 millions documents and send a query
with a filter to get 10 documents and sort them, Elasticsearch would at
first load all 10 millions values of the sorted field?

Cheers
Jonny

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/88e277e5-67c2-43e6-a849-e06f2ec6cb5e%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Alexander Reelsen) #8

Hey,

yes, fielddata loads all values into memory in order to ensure that future
queries also can profit.

Can you try the warmer API in your case, which ensures that fielddata
structures for a segment are loaded, before the first query hits you. See
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/indices-warmers.html

--Alex

On Fri, Dec 6, 2013 at 1:22 PM, J. Schulz js.bloonix@gmail.com wrote:

Hi again,

from the docs (
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/index-modules-fielddata.html
):

"The field data cache is used mainly when sorting on or faceting on a
field. It loads all the field values to memory in order to provide fast
document based access to those values."

What does that exact mean?

As example if I have an index with 10 millions documents and send a query
with a filter to get 10 documents and sort them, Elasticsearch would at
first load all 10 millions values of the sorted field?

Cheers
Jonny

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/88e277e5-67c2-43e6-a849-e06f2ec6cb5e%40googlegroups.com
.

For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAGCwEM-rrgHp-59cr%2BF-jaV3qqakm8iX4b2DHQRRh%2BW9nHNR1g%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.


(J. Schulz) #9

Hi Alex,

that is unbelievable.

As example if I have a index with 100 million documents, query all
documents from "2013-11-01 00:00:00" to "2013-11-01 00:01:00" (~23k) and
would sort it by time then ES would load all 100 million timestamps into
memory?

That is really evil, especially if I have more than 1 billion documents in
the future in one index.

In that case I have to do the sort within the application and not with ES,
or is there an option that ES just sort the result of the merged data of
all shards once and not load the fielddata?

Cheers
Jonny

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/dcf3d513-948d-4ae8-8bb6-392492460862%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(David Pilato) #10

In that case you probably like to design your indices using timestamped indices.
Let's say you have one index per month. Just search within this index. It will load less values.

Or use routing if you can.

My 2 cents.

--
David :wink:
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Le 7 déc. 2013 à 19:02, "J. Schulz" js.bloonix@gmail.com a écrit :

Hi Alex,

that is unbelievable.

As example if I have a index with 100 million documents, query all documents from "2013-11-01 00:00:00" to "2013-11-01 00:01:00" (~23k) and would sort it by time then ES would load all 100 million timestamps into memory?

That is really evil, especially if I have more than 1 billion documents in the future in one index.

In that case I have to do the sort within the application and not with ES, or is there an option that ES just sort the result of the merged data of all shards once and not load the fielddata?

Cheers
Jonny

You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/dcf3d513-948d-4ae8-8bb6-392492460862%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/337D7459-855B-498A-B144-A0F0EA56B57C%40pilato.fr.
For more options, visit https://groups.google.com/groups/opt_out.


(J. Schulz) #11

Hi David, I do that already (I think you read that: "curl
"data-2013-11/event/_search?routing=510") and use one index per month +
routing and currently one index has a size of ~50 million documents.

It doesn't really solve the sort problem if the indices grows more and more.

http://elasticsearch-users.115913.n3.nabble.com/Improve-query-time-td3800878.html

"Do not use sort" hits it exactly. Queries without sort are very fast and
all what I need is that the returned hits are sorted.

Think about PostgreSQL or MySQL and if a "order by" would at first load all
values of the ordered columns into memory... I thought it's possible to
tell ES just to sort the hits, but if that is not posible, then I have to
sort the returned data within the application and that shouldn't a problem.
Maybe it's possible in the future to do that directly via ES.

Am Samstag, 7. Dezember 2013 19:43:17 UTC+1 schrieb David Pilato:

In that case you probably like to design your indices using timestamped
indices.
Let's say you have one index per month. Just search within this index. It
will load less values.

Or use routing if you can.

My 2 cents.

--
David :wink:

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/405c31a4-3bc4-4687-8a59-a5e456cd15d5%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(system) #12