My index use 2T space, I have 10billion document, three nodes ,each node 4 cpus 13G mem , my match search is quite slow , it seem read io is use up , how to increase my search speed?
How many shards you have?
Could you start new nodes?
May I suggest you look at the following resources about sizing:
https://www.elastic.co/elasticon/conf/2016/sf/quantitative-cluster-sizing
And https://www.elastic.co/webinars/using-rally-to-get-your-elasticsearch-cluster-size-right
50 shards ,each shard use 46G space, about 3300 segments, I have no server for new node,do you have any suggestion?I think 10 billion document is the reason . there is 'hit' in result of restful query , if I want to only little records does elasticsearch search the whole index for count this 'hits'? and it only return 10 records each time , i think this is a waste of time , can elasticsearch remove 'hits' for speed up query ?
{ "took" : 61051, "timed_out" : false, "_shards" : { "total" : 50, "successful" : 50, "skipped" : 0, "failed" : 0 }, "hits" : { "total" : 2716157, "max_score" : 1.0,
like this hits total
What is the use case? Are you continuously indexing into the index? Are you updating documents in the index? If so, what is the indexing/update rate?
What type of storage do you have?
What is the query that returns the response you provided?
use for query match a filed with my own analyzer , I load all data once with bulk api, now I am not putting data into elastisearch , all my host are from azure cloud, storage should be ssd , read io almost 128M /second,query string like
"query":{"match":{"name":{"query":"ijmlajip","operator":"and"}}
If you are not indexing or updating data, I would recommend you force merge the index down to a single segment. If you still seem limited by disk I/O, I would recommend looking into getting faster storage for the nodes, but I am not very familiar with what is available on Azure.
you mean this ?
curl -X POST "es01:9200/myindex/_forcemerge?only_expunge_deletes=false&max_num_segments=100&flush=true"
now my index have 50 shards ,each shard has 20 segment , if I merge them to one segments I will have a big segment which take 46 GB space . is it too large ? are you sure this could improve search ?
set max_num_segments=1?
It should. There are also additional recommendations provided here.
how much speed will increase ? as you see my query take almost 60 seconds, If I merge all segments into one , can I compelete one simple query like
"query":{"match":{"name":{"query":"ijmlajip","operator":"and"}}
in 10 seconds?
I do not know. As you have indicated storage likely being the limitation, I would recommend upgrading that. This is probably what will make the biggest difference. From what I have heard, Azure Premium Storage is recommended for I/O intensive workloads.
each host has a limit of read io for 128m /second , if I have more hosts ,will search query be better? for index which take 2T space ,how many host should be ok ? I have 10 billion small documents ,do you really think 10 billion is not the bottleneck of my query ? if so how can i improve this ??
this is a profile of my query
Elasticsearch typically performs a lot of small random reads during querying rather than large sequential ones, so fast storage that is able to handle this type of load is essential. Many throughput metrics for disks assume sequential reads, so might not be representative.
You need to test and benchmark. Have a look at the links David provided.
What type of Azure storage are you using at the moment?
time for i in seq 1 1000
; do
dd bs=4k if=/dev/sdd count=1 skip=$(( $RANDOM * 128 )) >/dev/null 2>&1;
done
real 0m1.931s
user 0m0.622s
sys 0m1.386s
the result show is ssd
As I am not very familiar with Azure, that does not really tell me anything.
I had a look at the Azure documentation about storage options. For Standard SSD disks (if that is what you are using) they state:
Standard SSD disks combine elements of Premium SSD disks and Standard HDD disks to form a cost-effective solution best suited for applications like web servers that do not need high IOPS on disks.
Elasticsearch definitely require high IOPS, which is why Premium SSD disks are recommended.
merge segment quite slowly , can I reset some config to speed up this merge? just like "index.refresh_interval": "-1", "index.translog.durability": "async", "index.translog.sync_interval": "60s"
?
No, I do not think you can speed this up as it is quite I/O intensive.
thanks a lot ,your suggestions really help me ,my query response in ten seconds now
Just passing by. Is it correct to say, to summarize, that the main improvement is to merge everything into one segment but it only works if you don't update the index?
In this case, if you have a realtime usecase and you make one index per day, it's actually a good idea to do a one-segment merge for all the past days indexes so the requests that spans across many indexes are faster?
Yes, it can make searches faster, but also result in lower heap usage due to reduced need for global ordinals as outlined in this webinar.