Search two slow

willam_boss · November 2, 2018, 3:35am

My index use ２T space, I have 10billion document， three nodes ,each node 4 cpus 13G mem , my match search is quite slow , it seem read io is use up , how to increase my search speed?

dadoonet · November 2, 2018, 5:01am

How many shards you have?
Could you start new nodes?

May I suggest you look at the following resources about sizing:

https://www.elastic.co/elasticon/conf/2016/sf/quantitative-cluster-sizing

And https://www.elastic.co/webinars/using-rally-to-get-your-elasticsearch-cluster-size-right

willam_boss · November 2, 2018, 5:29am

50 shards ,each shard use 46G space, about 3300 segments, I have no server for new node,do you have any suggestion?I think 10 billion document is the reason . there is 'hit' in result of restful query , if I want to only little records does elasticsearch search the whole index for count this 'hits'? and it only return 10 records each time , i think this is a waste of time , can elasticsearch remove 'hits' for speed up query ?
{ "took" : 61051, "timed_out" : false, "_shards" : { "total" : 50, "successful" : 50, "skipped" : 0, "failed" : 0 }, "hits" : { "total" : 2716157, "max_score" : 1.0,
like this hits total

Christian_Dahlqvist · November 2, 2018, 7:19am

What is the use case? Are you continuously indexing into the index? Are you updating documents in the index? If so, what is the indexing/update rate?

What type of storage do you have?

What is the query that returns the response you provided?

willam_boss · November 2, 2018, 8:04am

use for query match a filed with my own analyzer , I load all data once with bulk api, now I am not putting data into elastisearch , all my host are from azure cloud, storage should be ssd , read io almost 128M /second,query string like
"query":{"match":{"name":{"query":"ijmlajip","operator":"and"}}

Christian_Dahlqvist · November 2, 2018, 8:09am

If you are not indexing or updating data, I would recommend you force merge the index down to a single segment. If you still seem limited by disk I/O, I would recommend looking into getting faster storage for the nodes, but I am not very familiar with what is available on Azure.

willam_boss · November 2, 2018, 8:12am

you mean this ?
curl -X POST "es01:9200/myindex/_forcemerge?only_expunge_deletes=false&max_num_segments=100&flush=true"
now my index have 50 shards ,each shard has 20 segment , if I merge them to one segments I will have a big segment which take 46 GB space . is it too large ? are you sure this could improve search ?

set max_num_segments=1?

Christian_Dahlqvist · November 2, 2018, 8:15am

It should. There are also additional recommendations provided here.

willam_boss · November 2, 2018, 8:20am

how much speed will increase ? as you see my query take almost 60 seconds, If I merge all segments into one , can I compelete one simple query like
"query":{"match":{"name":{"query":"ijmlajip","operator":"and"}}
in 10 seconds?

Christian_Dahlqvist · November 2, 2018, 8:27am

I do not know. As you have indicated storage likely being the limitation, I would recommend upgrading that. This is probably what will make the biggest difference. From what I have heard, Azure Premium Storage is recommended for I/O intensive workloads.

willam_boss · November 2, 2018, 8:36am

each host has a limit of read io for 128m /second , if I have more hosts ,will search query be better? for index which take 2T space ,how many host should be ok ? I have 10 billion small documents ,do you really think 10 billion is not the bottleneck of my query ? if so how can i improve this ??

this is a profile of my query

Christian_Dahlqvist · November 2, 2018, 8:40am

Elasticsearch typically performs a lot of small random reads during querying rather than large sequential ones, so fast storage that is able to handle this type of load is essential. Many throughput metrics for disks assume sequential reads, so might not be representative.

You need to test and benchmark. Have a look at the links David provided.

What type of Azure storage are you using at the moment?

willam_boss · November 2, 2018, 8:46am

time for i in seq 1 1000; do

dd bs=4k if=/dev/sdd count=1 skip=$(( $RANDOM * 128 )) >/dev/null 2>&1;

done

real 0m1.931s
user 0m0.622s
sys 0m1.386s

the result show is ssd

Christian_Dahlqvist · November 2, 2018, 8:48am

As I am not very familiar with Azure, that does not really tell me anything.

Christian_Dahlqvist · November 2, 2018, 9:10am

I had a look at the Azure documentation about storage options. For Standard SSD disks (if that is what you are using) they state:

Standard SSD disks combine elements of Premium SSD disks and Standard HDD disks to form a cost-effective solution best suited for applications like web servers that do not need high IOPS on disks.

Elasticsearch definitely require high IOPS, which is why Premium SSD disks are recommended.

willam_boss · November 2, 2018, 9:28am

merge segment quite slowly , can I reset some config to speed up this merge? just like "index.refresh_interval": "-1", "index.translog.durability": "async", "index.translog.sync_interval": "60s"
?

Christian_Dahlqvist · November 2, 2018, 9:36am

No, I do not think you can speed this up as it is quite I/O intensive.

willam_boss · November 5, 2018, 2:05am

thanks a lot ,your suggestions really help me ,my query response in ten seconds now

DidierB · November 8, 2018, 10:43am

Just passing by. Is it correct to say, to summarize, that the main improvement is to merge everything into one segment but it only works if you don't update the index?

In this case, if you have a realtime usecase and you make one index per day, it's actually a good idea to do a one-segment merge for all the past days indexes so the requests that spans across many indexes are faster?

Christian_Dahlqvist · November 8, 2018, 12:17pm

Yes, it can make searches faster, but also result in lower heap usage due to reduced need for global ordinals as outlined in this webinar.

Topic		Replies	Views
CPU utilization always high Elasticsearch	2	343	February 13, 2020
Slow search query issue [5.5] Elasticsearch	6	674	September 22, 2017
Elasticsearch Cluster Performance Tuning Help required Elasticsearch	15	692	December 30, 2018
Trying to optimize Elasticsearch cluster Elasticsearch	3	963	February 20, 2017
Understanding scaling for a read heavy cluster Elasticsearch	4	984	July 8, 2021

Search two slow

Related topics