we are using elastic for very efficient search but it seems to be taking ~800ms for one simple search.. this is when we hit directly using elastic API..
we have our own cluster with 3 nodes where one is as master and other two are data node. we have only one index with one data type as all items are same.. we have around 8 million records in that. we have 5 primary and 2 replica shards.
My query is:
having 8 million in one index/type might cause this?Should I consider splitting it?
we always hit master node to write and read, should I consider calling data node for reading?
anything else I should consider of doing it different to have better performance?
Thanks much in advance.
There is a lot of things performance depends on. What is your set up?
Are your nodes running in one machine or different machines?
The search performance also to some extent depends on the number of shards you use.
Are you performing a wildcard search with wildcard at the beginning?
we dont have VM.. each node is a complete unix machine.. i have three nodes .. all geographically located...
i have 5 primary and one replica shard for my index.. another qus is.. when I search, will elastic look in all nodes? data is stored in shards and shards are distributed in all nodes..I am doubting that this could be issue, as hosts are not in same region and if it goes to other node there might be latency..
there is not much wild card, we use term more where we search with start with and end with?
well ES will hit all shards in a round robin fashion but there are ways to control this behaviour using routing.
You can configure the ES to search in local shards using shard allocation awareness. Go through the below link, it might help you in your set up. I would suggest running ES on only one machine and get the reading for the query time.
For each query Elasticsearch will hit one copy of the each required shard, irrespective on which node it resides. You can make queries try to use local copies first by specifying _local preference. Having clusters distributed geographically can result in performance and stability issues, which is why it is not recommended.
Thank you very much.. This explains alot.. this is what even I have been doubting on..
have one more query, if I run multiple elastic processes on same host then will it add any advantage?
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.