Slow query performance with a large size value

dag · July 16, 2015, 8:55am

Hello,

Firstly, sorry for my wrong english.

I have a index called sample.
This index has a lot of documents. (about 2 millions)

My query example:

POST samples/sample/_search
{
"_source": ["someId"],
"size": 50000
}

This will take a long time. Why?

What can i do for improve my performance?

Currently has someId this mapping

"someId": {
"type": "string",
"analyzer": "keywords"
}

thanks for your help

Jason_Wee · July 16, 2015, 9:07am

literary just from your description, size of 50000 looks a lot. but it also depend on how many nodes you have in cluster, what are the existing work load like index/query.

dag · July 16, 2015, 9:48am

thanks for your response.

my cluster has 3 nodes two data and one logical node.

if this so expensive?
i will all someId's from my sample index.

in mysql i can do "select someId from sample" and this is very very fast

what is the problem?

thank's

jpountz · July 16, 2015, 10:08am

How long is "long" in your case? Do you know if the machines are I/O or CPU bound while running this query? Could you try to capture the output of the hot threads API while the query is running, it could give an idea of where time is spent.

dag · July 16, 2015, 10:38am

thanks for your reply
Long in my case is by a size from 50000 about 30 seconds.

Sorry i do not know a hot threads API.
What is this and how can i use this API?

Jason_Wee · July 16, 2015, 11:50am

fwiw https://www.elastic.co/guide/en/elasticsearch/reference/1.6/cluster-nodes-hot-threads.html

when the node is busy due to segment merging or gc on the heap, running this command it stuck in the command line forever, but in your case, it is worth to try.

hth